from:"Quanlong Huang \(JIRA\)"

[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

2024-06-12 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13154:
---

 Summary: Some tables are missing in Top-N Tables with Highest 
Memory Requirements
 Key: IMPALA-13154
 URL: https://issues.apache.org/jira/browse/IMPALA-13154
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with 
Highest Memory Requirements". However, not all tables are counted there. E.g. 
after starting catalogd, run a DESCRIBE on a table to trigger metadata loading 
on it. When it's done, the table is not shown in the WebUI.

The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
'type' is 
ThriftObjectType.FULL:
[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]

This used to be the place that all code paths using the table will go to. 
However, we've done bunch of optimizations to not getting the FULL thrift 
object of the table. We should move the code of updating the list of largest 
tables somewhere that all table usages can reach, e.g. after loading the 
metadata of the table, we can update its estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements

2024-06-12 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13154:
---

 Summary: Some tables are missing in Top-N Tables with Highest 
Memory Requirements
 Key: IMPALA-13154
 URL: https://issues.apache.org/jira/browse/IMPALA-13154
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang


In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with 
Highest Memory Requirements". However, not all tables are counted there. E.g. 
after starting catalogd, run a DESCRIBE on a table to trigger metadata loading 
on it. When it's done, the table is not shown in the WebUI.

The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 
'type' is 
ThriftObjectType.FULL:
[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459]

This used to be the place that all code paths using the table will go to. 
However, we've done bunch of optimizations to not getting the FULL thrift 
object of the table. We should move the code of updating the list of largest 
tables somewhere that all table usages can reach, e.g. after loading the 
metadata of the table, we can update its estimatedMetadataSize.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-11 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924
 ] 

Quanlong Huang commented on IMPALA-13152:
-

Assiging this  to [~rizaon] who knows more about this.

> IllegalStateException in computing processing cost when there are predicates 
> on analytic output columns
> ---
>
> Key: IMPALA-13152
> URL: https://issues.apache.org/jira/browse/IMPALA-13152
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Riza Suminto
>Priority: Major
>
> Saw an error in the following query when is on:
> {code:sql}
> create table tbl (a int, b int, c int);
> set COMPUTE_PROCESSING_COST=1;
> explain select a, b from (
>   select a, b, c,
> row_number() over(partition by a order by b desc) as latest
>   from tbl
> )b
> WHERE latest=1
> ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
> {code}
> Exception in the logs:
> {noformat}
> I0611 13:04:37.192874 28004 jni-util.cc:321] 
> 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: 
> Processing cost of PlanNode 01:TOP-N is invalid!
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
> at 
> org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
> at 
> org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
> at 
> org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
> Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-10 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13152:
---

 Summary: IllegalStateException in computing processing cost when 
there are predicates on analytic output columns
 Key: IMPALA-13152
 URL: https://issues.apache.org/jira/browse/IMPALA-13152
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang
Assignee: Riza Suminto


Saw an error in the following query when is on:
{code:sql}
create table tbl (a int, b int, c int);

set COMPUTE_PROCESSING_COST=1;

explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1

ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
{code}
Exception in the logs:
{noformat}
I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] 
java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is 
invalid!
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
at 
org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
at 
org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns

2024-06-10 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13152:
---

 Summary: IllegalStateException in computing processing cost when 
there are predicates on analytic output columns
 Key: IMPALA-13152
 URL: https://issues.apache.org/jira/browse/IMPALA-13152
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang
Assignee: Riza Suminto


Saw an error in the following query when is on:
{code:sql}
create table tbl (a int, b int, c int);

set COMPUTE_PROCESSING_COST=1;

explain select a, b from (
  select a, b, c,
row_number() over(partition by a order by b desc) as latest
  from tbl
)b
WHERE latest=1

ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid!
{code}
Exception in the logs:
{noformat}
I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] 
java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is 
invalid!
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at 
org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047)
at 
org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287)
at 
org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932)
at 
org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat}
Don't see the error if removing the predicate "latest=1".



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed

2024-06-10 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843
 ] 

Quanlong Huang commented on IMPALA-13093:
-

It seems adding this to hdfs-site.xml can also fix the issue:
{code:xml}

fs.obs.file.visibility.enable
true
{code}
I'll check whether OBS returns the real block size.
CC [~michaelsmith] [~eyizoha]

> Insert into Huawei OBS table failed
> ---
>
> Key: IMPALA-13093
> URL: https://issues.apache.org/jira/browse/IMPALA-13093
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Affects Versions: Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Insert into a table using Huawei OBS (Object Storage Service) as the storage 
> will failed by the following error:
> {noformat}
> Query: insert into test_obs1 values (1, 'abc')
> ERROR: Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory {noformat}
> Looking into the logs:
> {noformat}
> I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] 
> Failed to get info on temporary HDFS file: 
> obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
> Error(2): No such file or directory
> @   0xfc6d44  impala::Status::Status()
> @  0x1c42020  impala::HdfsTableSink::CreateNewTmpFile()
> @  0x1c44357  impala::HdfsTableSink::InitOutputPartition()
> @  0x1c4988a  impala::HdfsTableSink::GetOutputPartition()
> @  0x1c46569  impala::HdfsTableSink::Send()
> @  0x14ee25f  impala::FragmentInstanceState::ExecInternal()
> @  0x14efca3  impala::FragmentInstanceState::Exec()
> @  0x148dc4c  impala::QueryState::ExecFInstance()
> @  0x1b3bab9  impala::Thread::SuperviseThread()
> @  0x1b3cdb1  boost::detail::thread_data<>::run()
> @  0x2474a87  thread_proxy
> @ 0x7fe5a562dea5  start_thread
> @ 0x7fe5a25ddb0d  __clone{noformat}
> Note that impalad is started with {{--symbolize_stacktrace=true}} so the 
> stacktrace has symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI

2024-06-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13149:
---

 Summary: Show JVM info in the WebUI
 Key: IMPALA-13149
 URL: https://issues.apache.org/jira/browse/IMPALA-13149
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


It'd be helpful to show the JVM info in the WebUI, e.g. show the output of 
"java -version":
{code:java}
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code}
On nodes just have JRE deployed, we'd like to deploy the same version of JDK to 
perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI

2024-06-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13149:
---

 Summary: Show JVM info in the WebUI
 Key: IMPALA-13149
 URL: https://issues.apache.org/jira/browse/IMPALA-13149
 Project: IMPALA
  Issue Type: New Feature
Reporter: Quanlong Huang


It'd be helpful to show the JVM info in the WebUI, e.g. show the output of 
"java -version":
{code:java}
openjdk version "1.8.0_412"
OpenJDK Runtime Environment (build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code}
On nodes just have JRE deployed, we'd like to deploy the same version of JDK to 
perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13148:

Attachment: Selection_123.png
Selection_122.png

> Show the number of in-progress Catalog operations
> -
>
> Key: IMPALA-13148
> URL: https://issues.apache.org/jira/browse/IMPALA-13148
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: newbie, ramp-up
> Attachments: Selection_122.png, Selection_123.png
>
>
> In the /operations page of catalogd WebUI, the list of In-progress Catalog 
> Operations are shown. It'd be helpful to also show the number of such 
> operations. Like in the /queries page of coordinator WebUI, it shows 100 
> queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13148:
---

 Summary: Show the number of in-progress Catalog operations
 Key: IMPALA-13148
 URL: https://issues.apache.org/jira/browse/IMPALA-13148
 Project: IMPALA
  Issue Type: Improvement
Reporter: Quanlong Huang
 Attachments: Selection_122.png, Selection_123.png

In the /operations page of catalogd WebUI, the list of In-progress Catalog 
Operations are shown. It'd be helpful to also show the number of such 
operations. Like in the /queries page of coordinator WebUI, it shows 100 
queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations

2024-06-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13148:
---

 Summary: Show the number of in-progress Catalog operations
 Key: IMPALA-13148
 URL: https://issues.apache.org/jira/browse/IMPALA-13148
 Project: IMPALA
  Issue Type: Improvement
Reporter: Quanlong Huang
 Attachments: Selection_122.png, Selection_123.png

In the /operations page of catalogd WebUI, the list of In-progress Catalog 
Operations are shown. It'd be helpful to also show the number of such 
operations. Like in the /queries page of coordinator WebUI, it shows 100 
queries in flight.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock

2024-06-03 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13126:
---

 Summary: ReloadEvent.isOlderEvent() should hold the table read lock
 Key: IMPALA-13126
 URL: https://issues.apache.org/jira/browse/IMPALA-13126
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Sai Hemanth Gantasala


Saw an exception like this:
{noformat}
E0601 09:11:25.275251   246 MetastoreEventsProcessor.java:990] Unexpected 
exception received while processing event
Java exception follows:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
at 
org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616)
at 
org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597)
at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511)
at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489)
at 
org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024)
at 
org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754)
at 
org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) {noformat}
For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check 
whether the corresponding partition is reloaded after the event. This should be 
done after holding the table read lock. Otherwise, EventProcessor could hit the 
error above when there are concurrent DDLs/DMLs modifying the partition list.

CC [~VenuReddy]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock

2024-06-03 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13126:
---

 Summary: ReloadEvent.isOlderEvent() should hold the table read lock
 Key: IMPALA-13126
 URL: https://issues.apache.org/jira/browse/IMPALA-13126
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Sai Hemanth Gantasala


Saw an exception like this:
{noformat}
E0601 09:11:25.275251   246 MetastoreEventsProcessor.java:990] Unexpected 
exception received while processing event
Java exception follows:
java.util.ConcurrentModificationException
at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469)
at java.util.HashMap$ValueIterator.next(HashMap.java:1498)
at 
org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616)
at 
org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597)
at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511)
at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489)
at 
org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024)
at 
org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754)
at 
org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107)
at 
org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164)
at 
org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750) {noformat}
For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check 
whether the corresponding partition is reloaded after the event. This should be 
done after holding the table read lock. Otherwise, EventProcessor could hit the 
error above when there are concurrent DDLs/DMLs modifying the partition list.

CC [~VenuReddy]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13122) Show file stats in table loading logs

2024-06-02 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13122:
---

 Summary: Show file stats in table loading logs
 Key: IMPALA-13122
 URL: https://issues.apache.org/jira/browse/IMPALA-13122
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang


Here is an example for table loading logs on a table:
{noformat}
I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table 
definition and all partition(s) of tpcds.store_sales (needed by coordinator)
I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. 
Actual columns: 23
I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. 
Time taken: 26.699us
I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata 
from the Metastore: tpcds.store_sales
I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions 
for: tpcds.store_sales using partition batch size: 1000 
I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 
partitions for table tpcds.store_sales
I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 
partitions for table tpcds.store_sales
I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata 
from the Metastore: tpcds.store_sales
I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file 
and block metadata for 1824 paths for table tpcds.store_sales using a thread 
pool of size 5
I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata 
for tpcds.store_sales partitions: ss_sold_date_sk=2450816, 
ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 
569.107ms
I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: 
tpcds.store_sales set to: -1
I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: 
tpcds.store_sales (4026ms){noformat}
>From the logs, we know the table has 23 columns and 1824 partitions. Time 
>spent in loading the table schema and file metadata are also shown.

However, it's unknown whether there are small files issue under the partitions. 
The underlying storage could also be slow (e.g. S3) which results in a long 
time in loading file metadata.

It'd be helpful to add these in the logs:
 * number of files loaded
 * min/avg/max of file sizes
 * total file size
 * number of files
 * number of blocks (HDFS only)
 * number of hosts, disks (HDFS/Ozone only)
 * Stats of accessTime and lastModifiedTime

These can be aggregated in FileMetadataLoader#loadInternal() and logged in 
ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions().

[https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177]

[https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172]

[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13122) Show file stats in table loading logs

2024-06-02 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13122:
---

 Summary: Show file stats in table loading logs
 Key: IMPALA-13122
 URL: https://issues.apache.org/jira/browse/IMPALA-13122
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang


Here is an example for table loading logs on a table:
{noformat}
I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table 
definition and all partition(s) of tpcds.store_sales (needed by coordinator)
I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. 
Actual columns: 23
I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. 
Time taken: 26.699us
I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata 
from the Metastore: tpcds.store_sales
I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions 
for: tpcds.store_sales using partition batch size: 1000 
I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 
partitions for table tpcds.store_sales
I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 
partitions for table tpcds.store_sales
I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata 
from the Metastore: tpcds.store_sales
I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file 
and block metadata for 1824 paths for table tpcds.store_sales using a thread 
pool of size 5
I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata 
for tpcds.store_sales partitions: ss_sold_date_sk=2450816, 
ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 
569.107ms
I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: 
tpcds.store_sales set to: -1
I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: 
tpcds.store_sales (4026ms){noformat}
>From the logs, we know the table has 23 columns and 1824 partitions. Time 
>spent in loading the table schema and file metadata are also shown.

However, it's unknown whether there are small files issue under the partitions. 
The underlying storage could also be slow (e.g. S3) which results in a long 
time in loading file metadata.

It'd be helpful to add these in the logs:
 * number of files loaded
 * min/avg/max of file sizes
 * total file size
 * number of files
 * number of blocks (HDFS only)
 * number of hosts, disks (HDFS/Ozone only)
 * Stats of accessTime and lastModifiedTime

These can be aggregated in FileMetadataLoader#loadInternal() and logged in 
ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions().

[https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177]

[https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172]

[https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions

2024-05-30 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13117:
---

 Summary: Improve the heap usage during metadata loading and 
DDL/DML executions
 Key: IMPALA-13117
 URL: https://issues.apache.org/jira/browse/IMPALA-13117
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


The JVM heap size of catalogd is not just used by the metadata cache. The 
in-progress metadata loading threads and DDL/DML executions also creates temp 
objects, which introduces spikes in the heap usage. We should improve the heap 
usage in this part, especially when the metadata loading is slow due to 
external slowness (e.g. listing files on S3).

CC [~mylogi...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13116:
---

Assignee: Quanlong Huang

> In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if 
> the table is invalidated
> ---
>
> Key: IMPALA-13116
> URL: https://issues.apache.org/jira/browse/IMPALA-13116
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
>  * User can explictly trigger an INVALIDATE METADATA  command
>  * The table could be invalidated by CatalogdTableInvalidator when 
> invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned 
> on
> Note that invalidating a table doesn't require holding the lock of the 
> HdfsTable object so it can finish even if there are on-going updates on the 
> table.
> The updated HdfsTable object won't be added to the metadata cache since it 
> has been replaced with an IncompleteTable object. It's only used in the 
> DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
> representation which is mostly the table name and catalog version. We don't 
> need the updates on the HdfsTable object to be finished. Thus, we can 
> consider aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions

2024-05-30 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13117:
---

 Summary: Improve the heap usage during metadata loading and 
DDL/DML executions
 Key: IMPALA-13117
 URL: https://issues.apache.org/jira/browse/IMPALA-13117
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


The JVM heap size of catalogd is not just used by the metadata cache. The 
in-progress metadata loading threads and DDL/DML executions also creates temp 
objects, which introduces spikes in the heap usage. We should improve the heap 
usage in this part, especially when the metadata loading is slow due to 
external slowness (e.g. listing files on S3).

CC [~mylogi...@gmail.com] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13116:
---

 Summary: In local-catalog mode, abort REFRESH and metadata 
reloading of DDL/DMLs if the table is invalidated
 Key: IMPALA-13116
 URL: https://issues.apache.org/jira/browse/IMPALA-13116
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang


A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
 * User can explictly trigger an INVALIDATE METADATA  command
 * The table could be invalidated by CatalogdTableInvalidator when 
invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on

Note that invalidating a table doesn't require holding the lock of the 
HdfsTable object so it can finish even if there are on-going updates on the 
table.

The updated HdfsTable object won't be added to the metadata cache since it has 
been replaced with an IncompleteTable object. It's only used in the 
DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
representation which is mostly the table name and catalog version. We don't 
need the updates on the HdfsTable object to be finished. Thus, we can consider 
aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated

2024-05-30 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13116:
---

 Summary: In local-catalog mode, abort REFRESH and metadata 
reloading of DDL/DMLs if the table is invalidated
 Key: IMPALA-13116
 URL: https://issues.apache.org/jira/browse/IMPALA-13116
 Project: IMPALA
  Issue Type: Improvement
  Components: Catalog
Reporter: Quanlong Huang


A table can be invalidated when there are DDL/DML/REFRESHs running in flight:
 * User can explictly trigger an INVALIDATE METADATA  command
 * The table could be invalidated by CatalogdTableInvalidator when 
invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on

Note that invalidating a table doesn't require holding the lock of the 
HdfsTable object so it can finish even if there are on-going updates on the 
table.

The updated HdfsTable object won't be added to the metadata cache since it has 
been replaced with an IncompleteTable object. It's only used in the 
DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal 
representation which is mostly the table name and catalog version. We don't 
need the updates on the HdfsTable object to be finished. Thus, we can consider 
aborting the reloading of such DDL/DML/REFRESH requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients

2024-05-29 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13115:
---

 Summary: Always add the query id in the error message to clients
 Key: IMPALA-13115
 URL: https://issues.apache.org/jira/browse/IMPALA-13115
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Quanlong Huang


We have some errors like "Failed due to unreachable impalad(s)". We should 
improve them to mention the query id, e.g. "Query ${query_id} failed due to 
unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in 
the /queries page. Coordinator logs are also flushed out quickly. It's hard to 
find the query id there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients

2024-05-29 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13115:
---

 Summary: Always add the query id in the error message to clients
 Key: IMPALA-13115
 URL: https://issues.apache.org/jira/browse/IMPALA-13115
 Project: IMPALA
  Issue Type: Improvement
  Components: Backend
Reporter: Quanlong Huang


We have some errors like "Failed due to unreachable impalad(s)". We should 
improve them to mention the query id, e.g. "Query ${query_id} failed due to 
unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in 
the /queries page. Coordinator logs are also flushed out quickly. It's hard to 
find the query id there.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (IMPALA-12834) Add query load information to the query profile

2024-05-27 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-12834:
---

Assignee: YifanZhang

> Add query load information to the query profile
> ---
>
> Key: IMPALA-12834
> URL: https://issues.apache.org/jira/browse/IMPALA-12834
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Perf Investigation
>Reporter: YifanZhang
>Assignee: YifanZhang
>Priority: Minor
> Fix For: Impala 4.4.0
>
>
> Add query load information to the query profile to track if the performance 
> regression is related to the insufficient resources of the node, and also 
> recommend if the current pool configurations or host configurations are 
> optimal.
> The load information should include:
>  * Number of running queries of the executor group on which the query is 
> scheduled
>  * Number of running fragment instances of the hosts on which the query is 
> scheduled
>  * Used/Reserved memory of the hosts on which the query is scheduled
>  * Some other useful metrics



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12182) Add CPU utilization time series graph for RuntimeProfile's sampled values

2024-05-27 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12182:

Fix Version/s: Impala 4.3.0

> Add CPU utilization time series graph for RuntimeProfile's sampled values
> -
>
> Key: IMPALA-12182
> URL: https://issues.apache.org/jira/browse/IMPALA-12182
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.3.0
>
> Attachments: 23-07-10_T15_33_44.png, 23-07-10_T15_36_26.png, 
> 23-07-10_T15_39_01.png, 23-07-10_T15_39_31.png, 23-07-10_T15_40_42.png, 
> 23-07-10_T15_40_50.png, 23-07-10_T15_40_55.png, cpu_utilization.png, 
> cpu_utilization_test-1.png, cpu_utilization_test-2.png, query_timeline.mkv, 
> simplescreenrecorder-2023-07-10_21.10.58.mkv, 
> simplescreenrecorder-2023-07-10_22.10.18.mkv, three_nodes.png, 
> three_nodes_zoomed_out.png, timeseries_cpu_utilization_line_plot.mkv, 
> two_nodes.png
>
>
> The RuntimeProfile contains samples of CPU utilization metrics for user, sys 
> and iowait clamped to 64 values (retrieved from the ChunkedTimeSeriesCounter, 
> but sampled similar to SamplingTimeSeriesCounter). 
> It would be helpful to see the recent aggregate CPU node utilization samples 
> for each of the different nodes.
> These are sampled every `periodic_counter_update_period_ms`.
> AggregatedRuntimeProfile used in the Thrift profile contains the complete 
> series of values from the ChunkedTimeSeriesCounter samples. But, as this 
> representation is difficult to provide in the JSON, they have been 
> downsampled to 64 values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12364) Display disk and network metrics in webUI's query timeline

2024-05-27 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12364:

Fix Version/s: Impala 4.4.0

> Display disk and network metrics in webUI's query timeline
> --
>
> Key: IMPALA-12364
> URL: https://issues.apache.org/jira/browse/IMPALA-12364
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: average_disk_network_metrics.mkv, 
> averaged_disk_network_metrics.png, both_charts_resize.mkv, 
> both_charts_resize.png, close_cpu_utilization_button.mkv, 
> draggable_resize_handle.png, hor_zoom_buttons.png, 
> horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, 
> host_utilization_close_button.png, host_utilization_resize_bar.png, 
> multiple_fragment_metrics.png, resize_drag_handle.mkv
>
>
> It would be helpful to display disk and network usage in human readable form 
> on the query timeline, aligning it along with the CPU utilization plot, below 
> the fragment timing diagram.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-11915) Support timeline and graphical plan exports in the webUI

2024-05-27 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-11915:

Fix Version/s: Impala 4.3.0

> Support timeline and graphical plan exports in the webUI
> 
>
> Key: IMPALA-11915
> URL: https://issues.apache.org/jira/browse/IMPALA-11915
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Quanlong Huang
>Assignee: Surya Hebbar
>Priority: Major
>  Labels: supportability
> Fix For: Impala 4.3.0
>
> Attachments: export_button.png, export_modal.png, 
> export_plan_example_70b4ecc5f6aec963e_85221a3b_plan.html, 
> export_timeline_example_0b4ecc5f6aec963e_85221a3b_timeline.svg, 
> exported_plan.png, exported_timeline.png, plan_download.png, 
> plan_download_button.png, plan_export.png, plan_export_modal.png, 
> plan_export_text_selection.png, svg_wrapped_export.html, text_selection.png, 
> timeline_download-1.png, timeline_download.png, timeline_download_button.png, 
> timeline_export.png, timeline_export_modal.png, 
> timeline_export_text_selection-1.png, timeline_export_text_selection.png
>
>
> The graphical plan in the web UI is useful. It'd be nice to provide a button 
> to download the svg picture.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-12178) Refined alignment of timeticks in the webUI timeline

2024-05-27 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12178:

Fix Version/s: Impala 4.3.0

> Refined alignment of timeticks in the webUI timeline
> 
>
> Key: IMPALA-12178
> URL: https://issues.apache.org/jira/browse/IMPALA-12178
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Minor
> Fix For: Impala 4.3.0
>
> Attachments: overflowed_timetick_label.png, timetick_label_fixed.png
>
>
> The timeticks on the query timeline page in the WebUI were partially being 
> hidden due to the overflow for long timestamps after SVG rendering.
> It would be better if the entire timtick label is displayed appropriately. 
> !overflowed_timetick_label.png|width=808,height=259!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13102.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at

[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-23 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13102.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at

[jira] [Commented] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users

2024-05-22 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848508#comment-17848508
 ] 

Quanlong Huang commented on IMPALA-12190:
-

Column masking and row filtering policies will also be messed up by RENAME. I 
think tag based policy will also be messed up if data lineages are not updated 
accordingly.

+1 for a new Ranger API that returns all policies matching a given table (and 
optionally for a given user). We also need this to improve IMPALA-11501 to 
avoid loading the table schema from HMS. Currently, to check whether a user has 
a corresponding column masking policy on a table, we have to load the table to 
get all the column names and check whether there are policies on each column, 
which is inefficient.

> Renaming table will cause losing privileges for non-admin users
> ---
>
> Key: IMPALA-12190
> URL: https://issues.apache.org/jira/browse/IMPALA-12190
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Gabor Kaszab
>Assignee: Sai Hemanth Gantasala
>Priority: Critical
>  Labels: alter-table, authorization, ranger
>
> Let's say user 'a' gets some privileges on table 't'. When this table gets 
> renamed (even by user 'a') then user 'a' loses its privileges on that table.
>  
> Repro steps:
>  # Start impala with Ranger
>  # start impala-shell as admin (-u admin)
>  # create table tmp (i int, s string) stored as parquet;
>  # grant all on table tmp to user ;
>  # grant all on table tmp to user ;
> {code:java}
> Query: show grant user  on table tmp
> +++--+---++-+--+-+-+---+--+-+
> | principal_type | principal_name | database | table | column | uri | 
> storage_type | storage_uri | udf | privilege | grant_option | create_time |
> +++--+---++-+--+-+-+---+--+-+
> | USER           |     | default  | tmp   | *      |     |          
>     |             |     | all       | false        | NULL        |
> +++--+---++-+--+-+-+---+--+-+
> Fetched 1 row(s) in 0.01s {code}
>  #  alter table tmp rename to tmp_1234;
>  # show grant user  on table tmp_1234;
> {code:java}
> Query: show grant user  on table tmp_1234
> Fetched 0 row(s) in 0.17s{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan

2024-05-21 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13074:

Labels: ramp-up  (was: )

> WRITE TO HDFS node is omitted from Web UI graphic plan
> --
>
> Key: IMPALA-13074
> URL: https://issues.apache.org/jira/browse/IMPALA-13074
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Noemi Pap-Takacs
>Priority: Major
>  Labels: ramp-up
>
> The query plan shows the nodes that take part in the execution, forming a 
> tree structure.
> It can be displayed in the CLI by issuing the EXPLAIN  command. When 
> the actual query is executed, the plan tree can also be viewed in the Impala 
> Web UI in a graphic form.
> However, the explain string and the graphic plan tree does not match: the top 
> node is missing from the Web UI.
> This is especially confusing in case of DDL and DML statements, where the 
> Data Sink is not displayed. This makes a SELECT * FROM table 
> indistinguishable from a CREATE TABLE, since both only displays the SCAN node 
> and omit the WRITE_TO_HDFS and SELECT node.
> It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan

2024-05-21 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848422#comment-17848422
 ] 

Quanlong Huang commented on IMPALA-13074:
-

Names like "HDFS WRITER", "KUDU WRITER" will be consistent with the ExecSummary.

> WRITE TO HDFS node is omitted from Web UI graphic plan
> --
>
> Key: IMPALA-13074
> URL: https://issues.apache.org/jira/browse/IMPALA-13074
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Noemi Pap-Takacs
>Priority: Major
>
> The query plan shows the nodes that take part in the execution, forming a 
> tree structure.
> It can be displayed in the CLI by issuing the EXPLAIN  command. When 
> the actual query is executed, the plan tree can also be viewed in the Impala 
> Web UI in a graphic form.
> However, the explain string and the graphic plan tree does not match: the top 
> node is missing from the Web UI.
> This is especially confusing in case of DDL and DML statements, where the 
> Data Sink is not displayed. This makes a SELECT * FROM table 
> indistinguishable from a CREATE TABLE, since both only displays the SCAN node 
> and omit the WRITE_TO_HDFS and SELECT node.
> It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-21 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848395#comment-17848395
 ] 

Quanlong Huang commented on IMPALA-13102:
-

Uploaded a patch for review: https://gerrit.cloudera.org/c/21445/

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
> at 
> org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
> at 
> org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
> at .: 
> org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
> table: default.alltypes_bak
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
> at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
> at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
> at 
> org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
> at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
> at org.apache.impala.catalog.Column.updateStats(Column.java:73)
> at 
> org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
> at

[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported

2024-05-20 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13103:
---

 Summary: Corrupt column stats are not reported
 Key: IMPALA-13103
 URL: https://issues.apache.org/jira/browse/IMPALA-13103
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Impala will report corrupt table stats in the query plan. However, corrupt 
column stats are not reported. For instance, consider the following table:
{code:sql}
create table t1 (id int, name string);
insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code}
with the following stats:
{code:sql}
alter table t1 set tblproperties('numRows'='4');
alter table t1 set column stats name ('numNulls'='0');{code}
Note that column "id" has missing stats and column "name" has missing/corrupt 
stats (ndv=-1, numNulls=0).
Grouping by "id" will report the missing stats:
{code:sql}
explain select id, count(*) from t1 group by id;

WARNING: The following tables are missing relevant table and/or column 
statistics.
default.t1{code}
However, grouping by "name" doesn't report the missing/corrupt stats:
{noformat}
explain select name, count(*) from t1 group by name;
+---+
| Explain String
|
+---+
| Max Per-Host Resource Reservation: Memory=38.00MB Threads=2   
|
| Per-Host Resource Estimates: Memory=144MB 
|
| Codegen disabled by planner   
|
| Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name 
|
|   
|
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
|
| |  Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB 
thread-reservation=2 |
| PLAN-ROOT SINK
|
| |  output exprs: name, count(*)   
|
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0|
| | 
|
| 01:AGGREGATE [FINALIZE]   
|
| |  output: count(*)   
|
| |  group by: name 
|
| |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB 
thread-reservation=0 |
| |  tuple-ids=1 row-size=20B cardinality=4 
|
| |  in pipelines: 01(GETNEXT), 00(OPEN)
|
| | 
|
| 00:SCAN HDFS [default.t1] 
|
|HDFS partitions=1/1 files=1 size=24B   
|
|stored statistics: 
|
|  table: rows=4 size=unavailable   
|
|  columns: all 
|
|extrapolated-rows=disabled max-scan-range-rows=4   
|
|mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1   
|
|tuple-ids=0 row-size=12B cardinality=4 
|
|in pipelines: 00(GETNEXT)  
|
+---+
{noformat}
CC [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported

2024-05-20 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13103:
---

 Summary: Corrupt column stats are not reported
 Key: IMPALA-13103
 URL: https://issues.apache.org/jira/browse/IMPALA-13103
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Impala will report corrupt table stats in the query plan. However, corrupt 
column stats are not reported. For instance, consider the following table:
{code:sql}
create table t1 (id int, name string);
insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code}
with the following stats:
{code:sql}
alter table t1 set tblproperties('numRows'='4');
alter table t1 set column stats name ('numNulls'='0');{code}
Note that column "id" has missing stats and column "name" has missing/corrupt 
stats (ndv=-1, numNulls=0).
Grouping by "id" will report the missing stats:
{code:sql}
explain select id, count(*) from t1 group by id;

WARNING: The following tables are missing relevant table and/or column 
statistics.
default.t1{code}
However, grouping by "name" doesn't report the missing/corrupt stats:
{noformat}
explain select name, count(*) from t1 group by name;
+---+
| Explain String
|
+---+
| Max Per-Host Resource Reservation: Memory=38.00MB Threads=2   
|
| Per-Host Resource Estimates: Memory=144MB 
|
| Codegen disabled by planner   
|
| Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name 
|
|   
|
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 
|
| |  Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB 
thread-reservation=2 |
| PLAN-ROOT SINK
|
| |  output exprs: name, count(*)   
|
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB 
thread-reservation=0|
| | 
|
| 01:AGGREGATE [FINALIZE]   
|
| |  output: count(*)   
|
| |  group by: name 
|
| |  mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB 
thread-reservation=0 |
| |  tuple-ids=1 row-size=20B cardinality=4 
|
| |  in pipelines: 01(GETNEXT), 00(OPEN)
|
| | 
|
| 00:SCAN HDFS [default.t1] 
|
|HDFS partitions=1/1 files=1 size=24B   
|
|stored statistics: 
|
|  table: rows=4 size=unavailable   
|
|  columns: all 
|
|extrapolated-rows=disabled max-scan-range-rows=4   
|
|mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1   
|
|tuple-ids=0 row-size=12B cardinality=4 
|
|in pipelines: 00(GETNEXT)  
|
+---+
{noformat}
CC [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-19 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847742#comment-17847742
 ] 

Quanlong Huang commented on IMPALA-13102:
-

In the Impala dev env, I can set the stats directly in postgresql:
{code:sql}
psql -q -U hiveuser -d ${METASTORE_DB}

HMS_home_quanlong_workspace_Impala_cdp=> select "TBL_ID" from "TBLS" where 
"TBL_NAME" = 'alltypes_bak';
 TBL_ID 

 244931
(1 row)
HMS_home_quanlong_workspace_Impala_cdp=>  select "CS_ID", "DB_NAME", 
"TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where 
"TBL_ID" = 244931;
 CS_ID | DB_NAME |  TABLE_NAME  |   COLUMN_NAME   | NUM_DISTINCTS 
---+-+--+-+---
 68767 | default | alltypes_bak | double_col  |10
 68766 | default | alltypes_bak | id  |  7300
 68765 | default | alltypes_bak | tinyint_col |10
 68764 | default | alltypes_bak | timestamp_col   |  7300
 68763 | default | alltypes_bak | smallint_col|10
 68762 | default | alltypes_bak | date_string_col |   736
 68761 | default | alltypes_bak | string_col  |10
 68760 | default | alltypes_bak | float_col   |10
 68759 | default | alltypes_bak | bigint_col  |10
 68758 | default | alltypes_bak | year| 2
 68757 | default | alltypes_bak | bool_col|  
 68756 | default | alltypes_bak | int_col |10
(12 rows)
HMS_home_quanlong_workspace_Impala_cdp=> UPDATE "TAB_COL_STATS" SET 
"NUM_DISTINCTS" = -100 where "CS_ID" = 68766;
HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", 
"TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "CS_ID" 
= 68766;
 CS_ID | DB_NAME |  TABLE_NAME  | COLUMN_NAME | NUM_DISTINCTS 
---+-+--+-+---
 68766 | default | alltypes_bak | id  |  -100
(1 row)
{code}

> Loading tables with illegal stats failed
> 
>
> Key: IMPALA-13102
> URL: https://issues.apache.org/jira/browse/IMPALA-13102
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
> table. So DROP STATS or DROP TABLE can't be perform on the table.
> {code:sql}
> [localhost:21050] default> drop stats alltypes_bak;
> Query: drop stats alltypes_bak
> ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
> CAUSED BY: TableLoadingException: Failed to load metadata for table: 
> default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}
> We should allow at least dropping the stats or dropping the table. So user 
> can use Impala to recover the stats.
> Stacktrace in the logs:
> {noformat}
> I0520 08:00:56.661746 17543 jni-util.cc:321] 
> 5343142d1173494f:44dcde8c] 
> org.apache.impala.common.AnalysisException: Failed to load metadata for 
> table: 'alltypes_bak'
> at 
> org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
> at 
> org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
> at 
> org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
> at 
> org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
> at 
> org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
> at 
> org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
> at 
> org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
> at 
> org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
> Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
> metadata for table: default.alltypes_bak
> CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
> avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
> numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
> at 
> org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
> at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
>

[jira] [Updated] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-19 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13102:

Description: 
When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
table. So DROP STATS or DROP TABLE can't be perform on the table.

{code:sql}
[localhost:21050] default> drop stats alltypes_bak;
Query: drop stats alltypes_bak
ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}

We should allow at least dropping the stats or dropping the table. So user can 
use Impala to recover the stats.

Stacktrace in the logs:
{noformat}
I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] 
org.apache.impala.common.AnalysisException: Failed to load metadata for table: 
'alltypes_bak'
at 
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
at 
org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
metadata for table: default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
at .: 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.alltypes_bak
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
at org.apache.impala.catalog.Column.updateStats(Column.java:73)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269)
... 8 more{noformat}
CC [~VenuReddy] [~hemanth619] [~ngangam]

  was:
When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
table. So DROP STATS or DROP TABLE can't be perform on the table.

{code:sql}
[localhost:21050] default> drop stats alltypes_bak;
Query: drop stats alltypes_bak
ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0,

[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13102:
---

 Summary: Loading tables with illegal stats failed
 Key: IMPALA-13102
 URL: https://issues.apache.org/jira/browse/IMPALA-13102
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
table. So DROP STATS or DROP TABLE can't be perform on the table.

{code:sql}
[localhost:21050] default> drop stats alltypes_bak;
Query: drop stats alltypes_bak
ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}

We should allow at least dropping the stats or dropping the table. So user can 
use Impala to recover the stats.

Stacktrace in the logs:
{noformat}
I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] 
org.apache.impala.common.AnalysisException: Failed to load metadata for table: 
'alltypes_bak'
at 
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
at 
org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
metadata for table: default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
at .: 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.alltypes_bak
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
at org.apache.impala.catalog.Column.updateStats(Column.java:73)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269)
... 8 more{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed

2024-05-19 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13102:
---

 Summary: Loading tables with illegal stats failed
 Key: IMPALA-13102
 URL: https://issues.apache.org/jira/browse/IMPALA-13102
 Project: IMPALA
  Issue Type: Bug
  Components: Catalog
Reporter: Quanlong Huang
Assignee: Quanlong Huang


When the table has illegal stats, e.g. numDVs=-100, Impala can't load the 
table. So DROP STATS or DROP TABLE can't be perform on the table.

{code:sql}
[localhost:21050] default> drop stats alltypes_bak;
Query: drop stats alltypes_bak
ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak'
CAUSED BY: TableLoadingException: Failed to load metadata for table: 
default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code}

We should allow at least dropping the stats or dropping the table. So user can 
use Impala to recover the stats.

Stacktrace in the logs:
{noformat}
I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] 
org.apache.impala.common.AnalysisException: Failed to load metadata for table: 
'alltypes_bak'
at 
org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974)
at 
org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94)
at 
org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551)
at 
org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498)
at 
org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542)
at 
org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224)
at 
org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985)
at 
org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175)
Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load 
metadata for table: default.alltypes_bak
CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162)
at org.apache.impala.catalog.Table.fromThrift(Table.java:586)
at 
org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479)
at 
org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
at 
org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
at 
org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114)
at 
org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585)
at 
org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196)
at .: 
org.apache.impala.catalog.TableLoadingException: Failed to load metadata for 
table: default.alltypes_bak
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213)
at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251)
at 
org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, 
avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, 
numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}
at 
com.google.common.base.Preconditions.checkState(Preconditions.java:512)
at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034)
at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676)
at org.apache.impala.catalog.Column.updateStats(Column.java:73)
at 
org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183)
at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513)
at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269)
... 8 more{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work

2024-05-17 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13094:
---

 Summary: Query links in /admission page of admissiond doesn't work
 Key: IMPALA-13094
 URL: https://issues.apache.org/jira/browse/IMPALA-13094
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang
 Attachments: Selection_115.png, Selection_116.png

In the /admission page, there are records for queued queries and running 
queries. The details links for these queries use the hostname of the 
admissiond. Instead, they should point to the corresponding coordinators.

Clicking on the link will jump to the /query_plan endpoint of the admissiond 
which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'.

Attached the screenshots for reference.

CC [~arawat] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work

2024-05-17 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13094:
---

 Summary: Query links in /admission page of admissiond doesn't work
 Key: IMPALA-13094
 URL: https://issues.apache.org/jira/browse/IMPALA-13094
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang
 Attachments: Selection_115.png, Selection_116.png

In the /admission page, there are records for queued queries and running 
queries. The details links for these queries use the hostname of the 
admissiond. Instead, they should point to the corresponding coordinators.

Clicking on the link will jump to the /query_plan endpoint of the admissiond 
which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'.

Attached the screenshots for reference.

CC [~arawat] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work

2024-05-17 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13094:

Attachment: Selection_116.png

> Query links in /admission page of admissiond doesn't work
> -
>
> Key: IMPALA-13094
> URL: https://issues.apache.org/jira/browse/IMPALA-13094
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: Selection_115.png, Selection_116.png
>
>
> In the /admission page, there are records for queued queries and running 
> queries. The details links for these queries use the hostname of the 
> admissiond. Instead, they should point to the corresponding coordinators.
> Clicking on the link will jump to the /query_plan endpoint of the admissiond 
> which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'.
> Attached the screenshots for reference.
> CC [~arawat] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work

2024-05-17 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13094:

Attachment: Selection_115.png

> Query links in /admission page of admissiond doesn't work
> -
>
> Key: IMPALA-13094
> URL: https://issues.apache.org/jira/browse/IMPALA-13094
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: Selection_115.png, Selection_116.png
>
>
> In the /admission page, there are records for queued queries and running 
> queries. The details links for these queries use the hostname of the 
> admissiond. Instead, they should point to the corresponding coordinators.
> Clicking on the link will jump to the /query_plan endpoint of the admissiond 
> which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'.
> Attached the screenshots for reference.
> CC [~arawat] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed

2024-05-16 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13093:
---

 Summary: Insert into Huawei OBS table failed
 Key: IMPALA-13093
 URL: https://issues.apache.org/jira/browse/IMPALA-13093
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Insert into a table using Huawei OBS (Object Storage Service) as the storage 
will failed by the following error:
{noformat}
Query: insert into test_obs1 values (1, 'abc')

ERROR: Failed to get info on temporary HDFS file: 
obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
Error(2): No such file or directory {noformat}
Looking into the logs:
{noformat}
I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] 
Failed to get info on temporary HDFS file: 
obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
Error(2): No such file or directory
@   0xfc6d44  impala::Status::Status()
@  0x1c42020  impala::HdfsTableSink::CreateNewTmpFile()
@  0x1c44357  impala::HdfsTableSink::InitOutputPartition()
@  0x1c4988a  impala::HdfsTableSink::GetOutputPartition()
@  0x1c46569  impala::HdfsTableSink::Send()
@  0x14ee25f  impala::FragmentInstanceState::ExecInternal()
@  0x14efca3  impala::FragmentInstanceState::Exec()
@  0x148dc4c  impala::QueryState::ExecFInstance()
@  0x1b3bab9  impala::Thread::SuperviseThread()
@  0x1b3cdb1  boost::detail::thread_data<>::run()
@  0x2474a87  thread_proxy
@ 0x7fe5a562dea5  start_thread
@ 0x7fe5a25ddb0d  __clone{noformat}
Note that impalad is started with {{--symbolize_stacktrace=true}} so the 
stacktrace has symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed

2024-05-16 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13093:
---

 Summary: Insert into Huawei OBS table failed
 Key: IMPALA-13093
 URL: https://issues.apache.org/jira/browse/IMPALA-13093
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Affects Versions: Impala 4.3.0
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Insert into a table using Huawei OBS (Object Storage Service) as the storage 
will failed by the following error:
{noformat}
Query: insert into test_obs1 values (1, 'abc')

ERROR: Failed to get info on temporary HDFS file: 
obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
Error(2): No such file or directory {noformat}
Looking into the logs:
{noformat}
I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] 
Failed to get info on temporary HDFS file: 
obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt
Error(2): No such file or directory
@   0xfc6d44  impala::Status::Status()
@  0x1c42020  impala::HdfsTableSink::CreateNewTmpFile()
@  0x1c44357  impala::HdfsTableSink::InitOutputPartition()
@  0x1c4988a  impala::HdfsTableSink::GetOutputPartition()
@  0x1c46569  impala::HdfsTableSink::Send()
@  0x14ee25f  impala::FragmentInstanceState::ExecInternal()
@  0x14efca3  impala::FragmentInstanceState::Exec()
@  0x148dc4c  impala::QueryState::ExecFInstance()
@  0x1b3bab9  impala::Thread::SuperviseThread()
@  0x1b3cdb1  boost::detail::thread_data<>::run()
@  0x2474a87  thread_proxy
@ 0x7fe5a562dea5  start_thread
@ 0x7fe5a25ddb0d  __clone{noformat}
Note that impalad is started with {{--symbolize_stacktrace=true}} so the 
stacktrace has symbols.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns

2024-05-15 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13086:

Attachment: plan.txt

> Cardinality estimate of AggregationNode should consider predicates on 
> group-by columns
> --
>
> Key: IMPALA-13086
> URL: https://issues.apache.org/jira/browse/IMPALA-13086
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Priority: Critical
> Attachments: plan.txt
>
>
> Consider the following tables:
> {code:sql}
> CREATE EXTERNAL TABLE t1(
>   t1_id bigint,
>   t5_id bigint,
>   t5_name string,
>   register_date string
> ) stored as textfile;
> CREATE EXTERNAL TABLE t2(
>   t1_id bigint,
>   t3_id bigint,
>   pay_time timestamp,
>   refund_time timestamp,
>   state_code int
> ) stored as textfile;
> CREATE EXTERNAL TABLE t3(
>   t3_id bigint,
>   t3_name string,
>   class_id int
> ) stored as textfile;
> CREATE EXTERNAL TABLE t5( 
>   id bigint,
>   t5_id bigint,
>   t5_name string,
>   branch_id bigint,
>   branch_name string
> ) stored as textfile;
> alter table t1 set tblproperties('numRows'='6031170829');
> alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0');
> alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0');
> alter table t1 set column stats t5_name 
> ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738');
> alter table t1 set column stats register_date 
> ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8');
> alter table t2 set tblproperties('numRows'='864341085');
> alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0');
> alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503');
> alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0');
> alter table t2 set column stats refund_time 
> ('numDVs'='251658','numNulls'='791645118');
> alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0');
> alter table t3 set tblproperties('numRows'='4452');
> alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0');
> alter table t3 set column stats t3_name 
> ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234');
> alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0');
> alter table t5 set tblproperties('numRows'='2177245');
> alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0');
> alter table t5 set column stats t5_name 
> ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934');
> alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0');
> alter table t5 set column stats branch_name 
> ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172');
> {code}
> Put a data file to each table to make the stats valid
> {code:bash}
> echo '2024' > data.txt
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3
> hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5
> {code}
> REFRESH these tables after adding the data files.
> The cardinality of AggregationNodes are overestimated in the following query:
> {code:sql}
> explain select 
>   register_date,
>   t4.t5_id, 
>   t5.t5_name,
>   t5.branch_name,
>   count(distinct t1_id),
>   count(distinct case when diff_day=0 then t1_id else null end ),
>   count(distinct case when diff_day<=3 then t1_id else null end ),
>   count(distinct case when diff_day<=7 then t1_id else null end ),
>   count(distinct case when diff_day<=14 then t1_id else null end ),
>   count(distinct case when diff_day<=30 then t1_id else null end ),
>   count(distinct case when diff_day<=60 then t1_id else null end ),
>   count(distinct case when pay_time is not null then t1_id else null end )
> from (
>   select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name,
> datediff(pay_time,register_date) diff_day
>   from (
> select t1_id,pay_time,t3_id from t2
> where state_code = 0 and pay_time>=trunc(NOW(),'Y')
>   and cast(pay_time as date) <> cast(refund_time as date)
>   )t2
>   join t3 on t2.t3_id=t3.t3_id
>   right join t1 on t1.t1_id=t2.t1_id
> )t4
> left join t5 on t4.t5_id=t5.t5_id
> where register_date='20230515'
> group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code}
> One of the AggregationNode:
> {noformat}
> 17:AGGREGATE [FINALIZE]
> |  Class 0
> |output: count:merge(t1_id)
> |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
> |  Class 1
> |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END)
> |group

[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns

2024-05-15 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13086:
---

 Summary: Cardinality estimate of AggregationNode should consider 
predicates on group-by columns
 Key: IMPALA-13086
 URL: https://issues.apache.org/jira/browse/IMPALA-13086
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Consider the following tables:
{code:sql}
CREATE EXTERNAL TABLE t1(
  t1_id bigint,
  t5_id bigint,
  t5_name string,
  register_date string
) stored as textfile;

CREATE EXTERNAL TABLE t2(
  t1_id bigint,
  t3_id bigint,
  pay_time timestamp,
  refund_time timestamp,
  state_code int
) stored as textfile;

CREATE EXTERNAL TABLE t3(
  t3_id bigint,
  t3_name string,
  class_id int
) stored as textfile;

CREATE EXTERNAL TABLE t5( 
  id bigint,
  t5_id bigint,
  t5_name string,
  branch_id bigint,
  branch_name string
) stored as textfile;

alter table t1 set tblproperties('numRows'='6031170829');
alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0');
alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0');
alter table t1 set column stats t5_name 
('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738');
alter table t1 set column stats register_date 
('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8');

alter table t2 set tblproperties('numRows'='864341085');
alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0');
alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503');
alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0');
alter table t2 set column stats refund_time 
('numDVs'='251658','numNulls'='791645118');
alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0');

alter table t3 set tblproperties('numRows'='4452');
alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0');
alter table t3 set column stats t3_name 
('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234');
alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0');

alter table t5 set tblproperties('numRows'='2177245');
alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0');
alter table t5 set column stats t5_name 
('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934');
alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0');
alter table t5 set column stats branch_name 
('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172');
{code}
Put a data file to each table to make the stats valid
{code:bash}
echo '2024' > data.txt
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5
{code}
REFRESH these tables after adding the data files.

The cardinality of AggregationNodes are overestimated in the following query:
{code:sql}
explain select 
  register_date,
  t4.t5_id, 
  t5.t5_name,
  t5.branch_name,
  count(distinct t1_id),
  count(distinct case when diff_day=0 then t1_id else null end ),
  count(distinct case when diff_day<=3 then t1_id else null end ),
  count(distinct case when diff_day<=7 then t1_id else null end ),
  count(distinct case when diff_day<=14 then t1_id else null end ),
  count(distinct case when diff_day<=30 then t1_id else null end ),
  count(distinct case when diff_day<=60 then t1_id else null end ),
  count(distinct case when pay_time is not null then t1_id else null end )
from (
  select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name,
datediff(pay_time,register_date) diff_day
  from (
select t1_id,pay_time,t3_id from t2
where state_code = 0 and pay_time>=trunc(NOW(),'Y')
  and cast(pay_time as date) <> cast(refund_time as date)
  )t2
  join t3 on t2.t3_id=t3.t3_id
  right join t1 on t1.t1_id=t2.t1_id
)t4
left join t5 on t4.t5_id=t5.t5_id
where register_date='20230515'
group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code}
One of the AggregationNode:
{noformat}
17:AGGREGATE [FINALIZE]
|  Class 0
|output: count:merge(t1_id)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 1
|output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 2
|output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 3
|output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 4
|output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id,

[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns

2024-05-15 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13086:
---

 Summary: Cardinality estimate of AggregationNode should consider 
predicates on group-by columns
 Key: IMPALA-13086
 URL: https://issues.apache.org/jira/browse/IMPALA-13086
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Consider the following tables:
{code:sql}
CREATE EXTERNAL TABLE t1(
  t1_id bigint,
  t5_id bigint,
  t5_name string,
  register_date string
) stored as textfile;

CREATE EXTERNAL TABLE t2(
  t1_id bigint,
  t3_id bigint,
  pay_time timestamp,
  refund_time timestamp,
  state_code int
) stored as textfile;

CREATE EXTERNAL TABLE t3(
  t3_id bigint,
  t3_name string,
  class_id int
) stored as textfile;

CREATE EXTERNAL TABLE t5( 
  id bigint,
  t5_id bigint,
  t5_name string,
  branch_id bigint,
  branch_name string
) stored as textfile;

alter table t1 set tblproperties('numRows'='6031170829');
alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0');
alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0');
alter table t1 set column stats t5_name 
('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738');
alter table t1 set column stats register_date 
('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8');

alter table t2 set tblproperties('numRows'='864341085');
alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0');
alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503');
alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0');
alter table t2 set column stats refund_time 
('numDVs'='251658','numNulls'='791645118');
alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0');

alter table t3 set tblproperties('numRows'='4452');
alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0');
alter table t3 set column stats t3_name 
('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234');
alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0');

alter table t5 set tblproperties('numRows'='2177245');
alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0');
alter table t5 set column stats t5_name 
('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934');
alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0');
alter table t5 set column stats branch_name 
('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172');
{code}
Put a data file to each table to make the stats valid
{code:bash}
echo '2024' > data.txt
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3
hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5
{code}
REFRESH these tables after adding the data files.

The cardinality of AggregationNodes are overestimated in the following query:
{code:sql}
explain select 
  register_date,
  t4.t5_id, 
  t5.t5_name,
  t5.branch_name,
  count(distinct t1_id),
  count(distinct case when diff_day=0 then t1_id else null end ),
  count(distinct case when diff_day<=3 then t1_id else null end ),
  count(distinct case when diff_day<=7 then t1_id else null end ),
  count(distinct case when diff_day<=14 then t1_id else null end ),
  count(distinct case when diff_day<=30 then t1_id else null end ),
  count(distinct case when diff_day<=60 then t1_id else null end ),
  count(distinct case when pay_time is not null then t1_id else null end )
from (
  select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name,
datediff(pay_time,register_date) diff_day
  from (
select t1_id,pay_time,t3_id from t2
where state_code = 0 and pay_time>=trunc(NOW(),'Y')
  and cast(pay_time as date) <> cast(refund_time as date)
  )t2
  join t3 on t2.t3_id=t3.t3_id
  right join t1 on t1.t1_id=t2.t1_id
)t4
left join t5 on t4.t5_id=t5.t5_id
where register_date='20230515'
group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code}
One of the AggregationNode:
{noformat}
17:AGGREGATE [FINALIZE]
|  Class 0
|output: count:merge(t1_id)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 1
|output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 2
|output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 3
|output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name
|  Class 4
|output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END)
|group by: register_date, t4.t5_id,

[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-05-15 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846770#comment-17846770
 ] 

Quanlong Huang commented on IMPALA-13077:
-

It seems doable:
 * catalogd always loads the HMS partition objects and 'numRows' is extracted 
from the parameters: 
[https://github.com/apache/impala/blob/f87c20800de9f7dc74e47aa9a8c0dc878f4f0840/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1415]
 * coordinator always loads all partitions when planning such queries.

Pulling partition level column stats like NDVs will help more since they are 
more accurate than the table level column stats. But using the partition level 
'numRows' already helps a lot in this case.

> Equality predicate on partition column and uncorrelated subquery doesn't 
> reduce the cardinality estimate
> 
>
> Key: IMPALA-13077
> URL: https://issues.apache.org/jira/browse/IMPALA-13077
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
> Consider the following query:
> {code:sql}
> select xxx from part_tbl
> where part_key=(select ... from dim_tbl);
> {code}
> Its query plan is a JoinNode with two ScanNodes. When estimating the 
> cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
> partition column and the cardinality of the JoinNode should not be larger 
> than the max row count across partitions.
> The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
> reduction) helps in some cases since there are runtime filters on the 
> partition column. But there are still some cases that we overestimate the 
> cardinality. For instance, 'ss_sold_date_sk' is the only partition key of 
> tpcds.store_sales. The following query
> {code:sql}
> select count(*) from tpcds.store_sales
> where ss_sold_date_sk=(
>   select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
> has query plan:
> {noformat}
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
> | Per-Host Resource Estimates: Memory=243MB   |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 09:AGGREGATE [FINALIZE] |
> | |  output: count:merge(*)   |
> | |  row-size=8B cardinality=1|
> | |   |
> | 08:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 04:AGGREGATE|
> | |  output: count(*) |
> | |  row-size=8B cardinality=1|
> | |   |
> | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
> | |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
> | |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
> | |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
> partitions
> | |   |
> | |--07:EXCHANGE [BROADCAST]  |
> | |  ||
> | |  06:AGGREGATE [FINALIZE]  |
> | |  |  output: min:merge(d_date_sk)  |
> | |  |  row-size=4B cardinality=1 |
> | |  ||
> | |  05:EXCHANGE [UNPARTITIONED]  |
> | |  ||
> | |  02:AGGREGATE |
> | |  |  output: min(d_date_sk)|
> | |  |  row-size=4B cardinality=1 |
> | |  ||
> | |  01:SCAN HDFS [tpcds.date_dim]|
> | | HDFS partitions=1/1 files=1 size=9.84MB   |
> | | row-size=4B cardinality=73.05K|
> | |   |
> | 00:SCAN HDFS [tpcds.store_sales]|
> |HDFS

[jira] [Assigned] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-05-14 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13077:
---

Assignee: Quanlong Huang

> Equality predicate on partition column and uncorrelated subquery doesn't 
> reduce the cardinality estimate
> 
>
> Key: IMPALA-13077
> URL: https://issues.apache.org/jira/browse/IMPALA-13077
> Project: IMPALA
>  Issue Type: Bug
>  Components: Frontend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
> Consider the following query:
> {code:sql}
> select xxx from part_tbl
> where part_key=(select ... from dim_tbl);
> {code}
> Its query plan is a JoinNode with two ScanNodes. When estimating the 
> cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
> partition column and the cardinality of the JoinNode should not be larger 
> than the max row count across partitions.
> The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
> reduction) helps in some cases since there are runtime filters on the 
> partition column. But there are still some cases that we overestimate the 
> cardinality. For instance, 'ss_sold_date_sk' is the only partition key of 
> tpcds.store_sales. The following query
> {code:sql}
> select count(*) from tpcds.store_sales
> where ss_sold_date_sk=(
>   select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
> has query plan:
> {noformat}
> +-+
> | Explain String  |
> +-+
> | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
> | Per-Host Resource Estimates: Memory=243MB   |
> | |
> | PLAN-ROOT SINK  |
> | |   |
> | 09:AGGREGATE [FINALIZE] |
> | |  output: count:merge(*)   |
> | |  row-size=8B cardinality=1|
> | |   |
> | 08:EXCHANGE [UNPARTITIONED] |
> | |   |
> | 04:AGGREGATE|
> | |  output: count(*) |
> | |  row-size=8B cardinality=1|
> | |   |
> | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
> | |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
> | |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
> | |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
> partitions
> | |   |
> | |--07:EXCHANGE [BROADCAST]  |
> | |  ||
> | |  06:AGGREGATE [FINALIZE]  |
> | |  |  output: min:merge(d_date_sk)  |
> | |  |  row-size=4B cardinality=1 |
> | |  ||
> | |  05:EXCHANGE [UNPARTITIONED]  |
> | |  ||
> | |  02:AGGREGATE |
> | |  |  output: min(d_date_sk)|
> | |  |  row-size=4B cardinality=1 |
> | |  ||
> | |  01:SCAN HDFS [tpcds.date_dim]|
> | | HDFS partitions=1/1 files=1 size=9.84MB   |
> | | row-size=4B cardinality=73.05K|
> | |   |
> | 00:SCAN HDFS [tpcds.store_sales]|
> |HDFS partitions=1824/1824 files=1824 size=346.60MB   |
> |runtime filters: RF000 -> ss_sold_date_sk|
> |row-size=4B cardinality=2.88M|
> +-+{noformat}
> CC [~boroknagyz], [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters

2024-05-14 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-9577:
---
Fix Version/s: Impala 3.4.2

> Use `system_unsync` time for Kudu test clusters
> ---
>
> Key: IMPALA-9577
> URL: https://issues.apache.org/jira/browse/IMPALA-9577
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 4.0.0, Impala 3.4.2
>
>
> Recently Kudu made enhancements to time source configuration and adjusted the 
> time source for local clusters/tests to `system_unsync`. Impala should mirror 
> that behavior in Impala test clusters given there is no need to require 
> NTP-synchronized clock for a test where all the participating Kudu masters 
> and tablet servers are run at the same node using the same local wallclock.
>  
> See the Kudu commit here for details: 
> [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-05-13 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13077:
---

 Summary: Equality predicate on partition column and uncorrelated 
subquery doesn't reduce the cardinality estimate
 Key: IMPALA-13077
 URL: https://issues.apache.org/jira/browse/IMPALA-13077
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
Consider the following query:
{code:sql}
select xxx from part_tbl
where part_key=(select ... from dim_tbl);
{code}
Its query plan is a JoinNode with two ScanNodes. When estimating the 
cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
partition column and the cardinality of the JoinNode should not be larger than 
the max row count across partitions.

The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
reduction) helps in some cases since there are runtime filters on the partition 
column. But there are still some cases that we overestimate the cardinality. 
For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. 
The following query
{code:sql}
select count(*) from tpcds.store_sales
where ss_sold_date_sk=(
  select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
has query plan:
{noformat}
+-+
| Explain String  |
+-+
| Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
| Per-Host Resource Estimates: Memory=243MB   |
| |
| PLAN-ROOT SINK  |
| |   |
| 09:AGGREGATE [FINALIZE] |
| |  output: count:merge(*)   |
| |  row-size=8B cardinality=1|
| |   |
| 08:EXCHANGE [UNPARTITIONED] |
| |   |
| 04:AGGREGATE|
| |  output: count(*) |
| |  row-size=8B cardinality=1|
| |   |
| 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
| |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
| |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
| |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
partitions
| |   |
| |--07:EXCHANGE [BROADCAST]  |
| |  ||
| |  06:AGGREGATE [FINALIZE]  |
| |  |  output: min:merge(d_date_sk)  |
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  05:EXCHANGE [UNPARTITIONED]  |
| |  ||
| |  02:AGGREGATE |
| |  |  output: min(d_date_sk)|
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  01:SCAN HDFS [tpcds.date_dim]|
| | HDFS partitions=1/1 files=1 size=9.84MB   |
| | row-size=4B cardinality=73.05K|
| |   |
| 00:SCAN HDFS [tpcds.store_sales]|
|HDFS partitions=1824/1824 files=1824 size=346.60MB   |
|runtime filters: RF000 -> ss_sold_date_sk|
|row-size=4B cardinality=2.88M|
+-+{noformat}
CC [~boroknagyz], [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate

2024-05-13 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13077:
---

 Summary: Equality predicate on partition column and uncorrelated 
subquery doesn't reduce the cardinality estimate
 Key: IMPALA-13077
 URL: https://issues.apache.org/jira/browse/IMPALA-13077
 Project: IMPALA
  Issue Type: Bug
  Components: Frontend
Reporter: Quanlong Huang


Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. 
Consider the following query:
{code:sql}
select xxx from part_tbl
where part_key=(select ... from dim_tbl);
{code}
Its query plan is a JoinNode with two ScanNodes. When estimating the 
cardinality of the JoinNode, the planner is not aware that 'part_key' is the 
partition column and the cardinality of the JoinNode should not be larger than 
the max row count across partitions.

The recent work in IMPALA-12018 (Consider runtime filter for cardinality 
reduction) helps in some cases since there are runtime filters on the partition 
column. But there are still some cases that we overestimate the cardinality. 
For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. 
The following query
{code:sql}
select count(*) from tpcds.store_sales
where ss_sold_date_sk=(
  select min(d_date_sk) + 1000 from tpcds.date_dim);{code}
has query plan:
{noformat}
+-+
| Explain String  |
+-+
| Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 |
| Per-Host Resource Estimates: Memory=243MB   |
| |
| PLAN-ROOT SINK  |
| |   |
| 09:AGGREGATE [FINALIZE] |
| |  output: count:merge(*)   |
| |  row-size=8B cardinality=1|
| |   |
| 08:EXCHANGE [UNPARTITIONED] |
| |   |
| 04:AGGREGATE|
| |  output: count(*) |
| |  row-size=8B cardinality=1|
| |   |
| 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]|
| |  hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 |
| |  runtime filters: RF000 <- min(d_date_sk) + 1000  |
| |  row-size=4B cardinality=2.88M < Should be max(numRows) across 
partitions
| |   |
| |--07:EXCHANGE [BROADCAST]  |
| |  ||
| |  06:AGGREGATE [FINALIZE]  |
| |  |  output: min:merge(d_date_sk)  |
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  05:EXCHANGE [UNPARTITIONED]  |
| |  ||
| |  02:AGGREGATE |
| |  |  output: min(d_date_sk)|
| |  |  row-size=4B cardinality=1 |
| |  ||
| |  01:SCAN HDFS [tpcds.date_dim]|
| | HDFS partitions=1/1 files=1 size=9.84MB   |
| | row-size=4B cardinality=73.05K|
| |   |
| 00:SCAN HDFS [tpcds.store_sales]|
|HDFS partitions=1824/1824 files=1824 size=346.60MB   |
|runtime filters: RF000 -> ss_sold_date_sk|
|row-size=4B cardinality=2.88M|
+-+{noformat}
CC [~boroknagyz], [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13071) Update the doc of Impala components

2024-05-11 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13071:
---

 Summary: Update the doc of Impala components
 Key: IMPALA-13071
 URL: https://issues.apache.org/jira/browse/IMPALA-13071
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


We need to update some discription in the doc of Impala components.
[https://impala.apache.org/docs/build/asf-site-html/topics/impala_components.html]

In the section ofThe Impala Catalog Service, this is stale:
{quote}When you create a table, load data, and so on through Hive, you do need 
to issue REFRESH or INVALIDATE METADATA on an Impala daemon before executing a 
query there.
{quote}
We should mention "Automatic Invalidation/Refresh of Metadata", a.k.a. HMS 
event processor, and add links for it.

Change "Impala daemons" to "Impala Coordinators" in this sentence:
{quote}The Impala component known as the Catalog Service relays the metadata 
changes from Impala SQL statements to all the Impala 
{color:#de350b}daemons{color} in a cluster.
{quote}
Also add this link for "On-demand Metadata"
[https://impala.apache.org/docs/build/asf-site-html/topics/impala_metadata.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13071) Update the doc of Impala components

2024-05-11 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13071:
---

 Summary: Update the doc of Impala components
 Key: IMPALA-13071
 URL: https://issues.apache.org/jira/browse/IMPALA-13071
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


We need to update some discription in the doc of Impala components.
[https://impala.apache.org/docs/build/asf-site-html/topics/impala_components.html]

In the section ofThe Impala Catalog Service, this is stale:
{quote}When you create a table, load data, and so on through Hive, you do need 
to issue REFRESH or INVALIDATE METADATA on an Impala daemon before executing a 
query there.
{quote}
We should mention "Automatic Invalidation/Refresh of Metadata", a.k.a. HMS 
event processor, and add links for it.

Change "Impala daemons" to "Impala Coordinators" in this sentence:
{quote}The Impala component known as the Catalog Service relays the metadata 
changes from Impala SQL statements to all the Impala 
{color:#de350b}daemons{color} in a cluster.
{quote}
Also add this link for "On-demand Metadata"
[https://impala.apache.org/docs/build/asf-site-html/topics/impala_metadata.html]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13070) Introduce concepts in the query plan and execution

2024-05-11 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13070:
---

 Summary: Introduce concepts in the query plan and execution
 Key: IMPALA-13070
 URL: https://issues.apache.org/jira/browse/IMPALA-13070
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


We currently have 3 sections for "Impala Concepts":
 * Components of the Impala Server
 * Developing Impala Applications
 * How Impala Fits Into the Hadoop Ecosystem

[https://impala.apache.org/docs/build/asf-site-html/topics/impala_concepts.html]

It'd be helpful to introduce concepts used in the query plan and query 
execution, e.g.
 * Coordinator & Executor
 * Fragment & Fragment Instance
 * Operator, Pipeline
 * Cardinality, Memory Reservation, Memory Estimate
 * Split/ScanRange
 * Runtime Filter
 * Query Profile



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13070) Introduce concepts in the query plan and execution

2024-05-11 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13070:
---

 Summary: Introduce concepts in the query plan and execution
 Key: IMPALA-13070
 URL: https://issues.apache.org/jira/browse/IMPALA-13070
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


We currently have 3 sections for "Impala Concepts":
 * Components of the Impala Server
 * Developing Impala Applications
 * How Impala Fits Into the Hadoop Ecosystem

[https://impala.apache.org/docs/build/asf-site-html/topics/impala_concepts.html]

It'd be helpful to introduce concepts used in the query plan and query 
execution, e.g.
 * Coordinator & Executor
 * Fragment & Fragment Instance
 * Operator, Pipeline
 * Cardinality, Memory Reservation, Memory Estimate
 * Split/ScanRange
 * Runtime Filter
 * Query Profile



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Reopened] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory

2024-05-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reopened IMPALA-11858:
-

> admissiond incorrectly caps memory limit to its process memory
> --
>
> Key: IMPALA-11858
> URL: https://issues.apache.org/jira/browse/IMPALA-11858
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Critical
>
> When admission controller is running as a separate daemon it incorrectly caps 
> memory limit for the query to its process limit. This is also incorrect 
> behavior when admission controller is running in coordinator as executors 
> could have different memory limit compared to coordinator.
> https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory

2024-05-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11858.
-
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> admissiond incorrectly caps memory limit to its process memory
> --
>
> Key: IMPALA-11858
> URL: https://issues.apache.org/jira/browse/IMPALA-11858
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Critical
> Fix For: Impala 4.3.0
>
>
> When admission controller is running as a separate daemon it incorrectly caps 
> memory limit for the query to its process limit. This is also incorrect 
> behavior when admission controller is running in coordinator as executors 
> could have different memory limit compared to coordinator.
> https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory

2024-05-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11858.
-
Fix Version/s: Impala 4.3.0
   Resolution: Fixed

> admissiond incorrectly caps memory limit to its process memory
> --
>
> Key: IMPALA-11858
> URL: https://issues.apache.org/jira/browse/IMPALA-11858
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Abhishek Rawat
>Assignee: Abhishek Rawat
>Priority: Critical
> Fix For: Impala 4.3.0
>
>
> When admission controller is running as a separate daemon it incorrectly caps 
> memory limit for the query to its process limit. This is also incorrect 
> behavior when admission controller is running in coordinator as executors 
> could have different memory limit compared to coordinator.
> https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (IMPALA-11499) Refactor UrlEncode function to handle special characters

2024-05-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11499.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Resolving this. Thank [~pranav.lodha] !

> Refactor UrlEncode function to handle special characters
> 
>
> Key: IMPALA-11499
> URL: https://issues.apache.org/jira/browse/IMPALA-11499
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Pranav Yogi Lodha
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Partition values are incorrectly URL-encoded in backend for unicode 
> characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong.
> To reproduce the issue, first create a partition table:
> {code:sql}
> create table my_part_tbl (id int) partitioned by (p string) stored as parquet;
> {code}
> Then insert data into it using partition values containing '运'. They will 
> fail:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='运营业务数据') values (0)
> Query submitted at: 2022-08-16 10:03:56 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
> Error(5): Input/output error
> [localhost:21050] default> insert into my_part_tbl partition(p='运') values 
> (0);
> Query: insert into my_part_tbl partition(p='运') values (0)
> Query submitted at: 2022-08-16 10:04:22 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
> Error(5): Input/output error
> {noformat}
> However, partition value without the character '运' is OK:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='营业务数据') values (0)
> Query submitted at: 2022-08-16 10:04:13 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036
> Modified 1 row(s) in 0.21s
> {noformat}
> Hive is able to execute all these statements.
> I'm able to narrow down the issue into Backend, where we URL-encode the 
> partition value in HdfsTableSink::InitOutputPartition():
> {code:cpp}
>   string value_str;
>   partition_key_expr_evals_[j]->PrintValue(value, _str);
>   // Directory names containing partition-key values need to be 
> UrlEncoded, in
>   // particular to avoid problems when '/' is part of the key value 
> (which might
>   // occur, for example, with date strings). Hive will URL decode the 
> value
>   // transparently when Impala's frontend asks the metastore for 
> partition key values,
>   // which makes it particularly important that we use the same encoding 
> as Hive. It's
>   // also not necessary to encode the values when writing partition 
> metadata. You can
>   // check this with 'show partitions ' in Hive, followed by a 
> select from a
>   // decoded partition key value.
>   string encoded_str;
>   UrlEncode(value_str, _str, true);
> string part_key_value = (encoded_str.empty() ?
>

[jira] [Resolved] (IMPALA-11499) Refactor UrlEncode function to handle special characters

2024-05-10 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-11499.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

Resolving this. Thank [~pranav.lodha] !

> Refactor UrlEncode function to handle special characters
> 
>
> Key: IMPALA-11499
> URL: https://issues.apache.org/jira/browse/IMPALA-11499
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Pranav Yogi Lodha
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Partition values are incorrectly URL-encoded in backend for unicode 
> characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong.
> To reproduce the issue, first create a partition table:
> {code:sql}
> create table my_part_tbl (id int) partitioned by (p string) stored as parquet;
> {code}
> Then insert data into it using partition values containing '运'. They will 
> fail:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='运营业务数据') values (0)
> Query submitted at: 2022-08-16 10:03:56 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
> Error(5): Input/output error
> [localhost:21050] default> insert into my_part_tbl partition(p='运') values 
> (0);
> Query: insert into my_part_tbl partition(p='运') values (0)
> Query submitted at: 2022-08-16 10:04:22 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
> Error(5): Input/output error
> {noformat}
> However, partition value without the character '运' is OK:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='营业务数据') values (0)
> Query submitted at: 2022-08-16 10:04:13 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036
> Modified 1 row(s) in 0.21s
> {noformat}
> Hive is able to execute all these statements.
> I'm able to narrow down the issue into Backend, where we URL-encode the 
> partition value in HdfsTableSink::InitOutputPartition():
> {code:cpp}
>   string value_str;
>   partition_key_expr_evals_[j]->PrintValue(value, _str);
>   // Directory names containing partition-key values need to be 
> UrlEncoded, in
>   // particular to avoid problems when '/' is part of the key value 
> (which might
>   // occur, for example, with date strings). Hive will URL decode the 
> value
>   // transparently when Impala's frontend asks the metastore for 
> partition key values,
>   // which makes it particularly important that we use the same encoding 
> as Hive. It's
>   // also not necessary to encode the values when writing partition 
> metadata. You can
>   // check this with 'show partitions ' in Hive, followed by a 
> select from a
>   // decoded partition key value.
>   string encoded_str;
>   UrlEncode(value_str, _str, true);
> string part_key_value = (encoded_str.empty() ?
>

[jira] [Updated] (IMPALA-12688) Support JSON profile imports in webUI

2024-05-09 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12688:

Fix Version/s: Impala 4.4.0

> Support JSON profile imports in webUI
> -
>
> Key: IMPALA-12688
> URL: https://issues.apache.org/jira/browse/IMPALA-12688
> Project: IMPALA
>  Issue Type: New Feature
>Reporter: Surya Hebbar
>Assignee: Surya Hebbar
>Priority: Major
> Fix For: Impala 4.4.0
>
> Attachments: clear_all_button.png, descending_order_start_time.png, 
> imported_profiles_section.png, imported_queries_button.png, 
> imported_queries_list.png, imported_queries_page.png, 
> imported_query_statement.png, imported_query_text_plan.png, 
> imported_query_timeline.png, multiple_query_profile_import.png
>
>
> It would be helpful for users to visualize the query timeline by selecting a 
> local JSON query profile.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10451) TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive to have HIVE-24157

2024-05-09 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10451:
---

Assignee: Joe McDonnell  (was: Quanlong Huang)

> TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive 
> to have HIVE-24157
> ---
>
> Key: IMPALA-10451
> URL: https://issues.apache.org/jira/browse/IMPALA-10451
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Joe McDonnell
>Priority: Major
>
> TestAvroSchemaResolution.test_avro_schema_resolution recently fails when 
> building against a Hive version with HIVE-24157.
> {code:java}
> query_test.test_avro_schema_resolution.TestAvroSchemaResolution.test_avro_schema_resolution[protocol:
>  beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, 
> 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, 
> 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: 
> avro/snap/block] (from pytest)
> query_test/test_avro_schema_resolution.py:36: in test_avro_schema_resolution
>  self.run_test_case('QueryTest/avro-schema-resolution', vector, 
> unique_database)
> common/impala_test_suite.py:690: in run_test_case
>  self.__verify_results_and_errors(vector, test_section, result, use_db)
> common/impala_test_suite.py:523: in __verify_results_and_errors
>  replace_filenames_with_placeholder)
> common/test_result_verifier.py:456: in verify_raw_results
>  VERIFIER_MAP[verifier](expected, actual)
> common/test_result_verifier.py:278: in verify_query_result_is_equal
>  assert expected_results == actual_results
> E assert Comparing QueryTestResults (expected vs actual):
> E 10 != 0 
> {code}
> The failed query is
> {code:sql}
> select count(*) from functional_avro_snap.avro_coldef {code}
> The cause is that data loading for avro_coldef failed. The DML is
> {code:sql}
> INSERT OVERWRITE TABLE avro_coldef PARTITION(year=2014, month=1)
> SELECT bool_col, tinyint_col, smallint_col, int_col, bigint_col,
> float_col, double_col, date_string_col, string_col, timestamp_col
> FROM (select * from functional.alltypes order by id limit 5) a;
> {code}
> The failure (found in HS2) is:
> {code}
> 2021-01-24T01:52:16,340 ERROR [9433ee64-d706-4fa4-a146-18d71bf17013 
> HiveServer2-Handler-Pool: Thread-4946] parse.CalcitePlanner: CBO failed, 
> skipping CBO.
> org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting DATE/TIMESTAMP 
> types to NUMERIC is prohibited (hive.strict.timestamp.conversion)
>  at 
> org.apache.hadoop.hive.ql.udf.TimestampCastRestrictorResolver.getEvalMethod(TimestampCastRestrictorResolver.java:62)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:168)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:260)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:292)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDescWithUdfData(TypeCheckProcFactory.java:987)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.ParseUtils.createConversionCast(ParseUtils.java:163)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:8551)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7908)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11100)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10972)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11901)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11771)
>  ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169]
>  at 
>

[jira] [Created] (IMPALA-13066) SHOW CREATE TABLE with stats and partitions

2024-05-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13066:
---

 Summary: SHOW CREATE TABLE with stats and partitions
 Key: IMPALA-13066
 URL: https://issues.apache.org/jira/browse/IMPALA-13066
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend, Frontend
Reporter: Quanlong Huang


SHOW CREATE TABLE produces the statement to create the table. In practise, we 
also want the column stats and partitions. It'd be helpful to add an option for 
also producing the ADD PARTITION and SET COLUMN STATS statements. E.g.
{code:sql}
SHOW CREATE TABLE my_tbl WITH STATS;{code}
produces
{code:sql}
CREATE TABLE my_tbl ...;
ALTER TABLE my_tbl ADD PARTITION ...;
ALTER TABLE my_tbl PARTITION (...) SET TBLPROPERTIES('numRows'='3', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
ALTER TABLE my_tbl SET COLUMN STATS c1 
('numDVs'='19','numNulls'='0','maxSize'='8','avgSize'='8');
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13066) SHOW CREATE TABLE with stats and partitions

2024-05-09 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13066:
---

 Summary: SHOW CREATE TABLE with stats and partitions
 Key: IMPALA-13066
 URL: https://issues.apache.org/jira/browse/IMPALA-13066
 Project: IMPALA
  Issue Type: New Feature
  Components: Backend, Frontend
Reporter: Quanlong Huang


SHOW CREATE TABLE produces the statement to create the table. In practise, we 
also want the column stats and partitions. It'd be helpful to add an option for 
also producing the ADD PARTITION and SET COLUMN STATS statements. E.g.
{code:sql}
SHOW CREATE TABLE my_tbl WITH STATS;{code}
produces
{code:sql}
CREATE TABLE my_tbl ...;
ALTER TABLE my_tbl ADD PARTITION ...;
ALTER TABLE my_tbl PARTITION (...) SET TBLPROPERTIES('numRows'='3', 
'STATS_GENERATED_VIA_STATS_TASK'='true');
ALTER TABLE my_tbl SET COLUMN STATS c1 
('numDVs'='19','numNulls'='0','maxSize'='8','avgSize'='8');
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13065) Introduce package scripts to launch Impala processes

2024-05-08 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13065:
---

 Summary: Introduce package scripts to launch Impala processes
 Key: IMPALA-13065
 URL: https://issues.apache.org/jira/browse/IMPALA-13065
 Project: IMPALA
  Issue Type: Documentation
Reporter: Quanlong Huang


We should add document for how to use the scripts installed by the RPM/DEB 
packages at
https://impala.apache.org/docs/build/html/topics/impala_processes.html

CC [~yx91490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13064) Install services from RPM/DEB packages

2024-05-08 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13064:
---

 Summary: Install services from RPM/DEB packages
 Key: IMPALA-13064
 URL: https://issues.apache.org/jira/browse/IMPALA-13064
 Project: IMPALA
  Issue Type: New Feature
  Components: Infrastructure
Reporter: Quanlong Huang


Our doc mentions using the {{service}} command to start impala processes:
https://impala.apache.org/docs/build/html/topics/impala_processes.html

Start the statestore service using a command similar to the following:
{code}
$ sudo service impala-state-store start{code}
Start the catalog service using a command similar to the following:
{code}
$ sudo service impala-catalog start{code}
Start the Impala daemon services using a command similar to the following:
{code}
$ sudo service impala-server start{code}

The RPM/DEB package should install these services and launch the process using 
'impala' user name.

CC [~yx91490] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13064) Install services from RPM/DEB packages

2024-05-08 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13064:
---

 Summary: Install services from RPM/DEB packages
 Key: IMPALA-13064
 URL: https://issues.apache.org/jira/browse/IMPALA-13064
 Project: IMPALA
  Issue Type: New Feature
  Components: Infrastructure
Reporter: Quanlong Huang


Our doc mentions using the {{service}} command to start impala processes:
https://impala.apache.org/docs/build/html/topics/impala_processes.html

Start the statestore service using a command similar to the following:
{code}
$ sudo service impala-state-store start{code}
Start the catalog service using a command similar to the following:
{code}
$ sudo service impala-catalog start{code}
Start the Impala daemon services using a command similar to the following:
{code}
$ sudo service impala-server start{code}

The RPM/DEB package should install these services and launch the process using 
'impala' user name.

CC [~yx91490] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-08 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844655#comment-17844655
 ] 

Quanlong Huang commented on IMPALA-13034:
-

Uploaded a patch to add logs and counters first: 
[https://gerrit.cloudera.org/c/21412/]

With that we can identify the issue and find users that abuse the HTTP requests.

Filed IMPALA-13063 for a fix for the issue. We can discuss the solutions there.

> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13063) HTTP requests on in-flight queries blocks query execution in coordinator side

2024-05-08 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13063:
---

 Summary: HTTP requests on in-flight queries blocks query execution 
in coordinator side
 Key: IMPALA-13063
 URL: https://issues.apache.org/jira/browse/IMPALA-13063
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang


This is a follow-up task for IMPALA-13034.

HTTP requests on in-flight queries usually acquire the lock of 
ClientRequestState. This could blocks client requests in fetching query 
results. E.g. there are several endpoints in WebUI that can dump a query 
profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json. If the profile is huge, such requests impact the query 
performance.

Fetching the details (profile, exec summary, etc.) of an in-flight query has 
lower priority and shouldn't block query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13063) HTTP requests on in-flight queries blocks query execution in coordinator side

2024-05-08 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13063:
---

 Summary: HTTP requests on in-flight queries blocks query execution 
in coordinator side
 Key: IMPALA-13063
 URL: https://issues.apache.org/jira/browse/IMPALA-13063
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang


This is a follow-up task for IMPALA-13034.

HTTP requests on in-flight queries usually acquire the lock of 
ClientRequestState. This could blocks client requests in fetching query 
results. E.g. there are several endpoints in WebUI that can dump a query 
profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json. If the profile is huge, such requests impact the query 
performance.

Fetching the details (profile, exec summary, etc.) of an in-flight query has 
lower priority and shouldn't block query execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters

2024-05-07 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-9577:
---
Fix Version/s: Impala 4.0.0
   (was: Impala 3.4.0)

> Use `system_unsync` time for Kudu test clusters
> ---
>
> Key: IMPALA-9577
> URL: https://issues.apache.org/jira/browse/IMPALA-9577
> Project: IMPALA
>  Issue Type: Improvement
>Reporter: Grant Henke
>Assignee: Grant Henke
>Priority: Major
> Fix For: Impala 4.0.0
>
>
> Recently Kudu made enhancements to time source configuration and adjusted the 
> time source for local clusters/tests to `system_unsync`. Impala should mirror 
> that behavior in Impala test clusters given there is no need to require 
> NTP-synchronized clock for a test where all the participating Kudu masters 
> and tablet servers are run at the same node using the same local wallclock.
>  
> See the Kudu commit here for details: 
> [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13035) Querying metadata tables from non-Iceberg tables throws IllegalArgumentException

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13035:

Fix Version/s: Impala 4.5.0

> Querying metadata tables from non-Iceberg tables throws 
> IllegalArgumentException
> 
>
> Key: IMPALA-13035
> URL: https://issues.apache.org/jira/browse/IMPALA-13035
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Daniel Becker
>Priority: Minor
>  Labels: impala-iceberg
> Fix For: Impala 4.5.0
>
>
> If a query targets an Iceberg metadata table like default.xy.`files` and the 
> xy table is not an Iceberg table then the analyzer throws 
> IllegalArgumentException. 
> The main concern is that IcebergMetadataTable.java:isIcebergMetadataTable is 
> called before it's validated that the table is indeed an IcebergTable.
> Example: 
> {code:java}
> create table xy(a int);
> select * from default.xy.`files`;{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-13009) Potential leak of partition deletions in the catalog topic

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13009.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Potential leak of partition deletions in the catalog topic
> --
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, 
> Impala 4.1.2, Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Some partitions of a table are dropped.
> * The HdfsTable object is removed sequentially before catalogd collects the 
> dropped partitions.
> In such case, catalogd loses track of the dropped partitions so their updates 
> keep existing in the catalog topic, until the partition names are reused 
> again.
> Note that the HdfsTable object can be removed by commands like DropTable or 
> INVALIDATE.
> The leaked partitions will be detected when a coordinator restarts. An 
> IllegalStateException complaining stale partitions will be reported, causing 
> the table not being added to the catalog cache of coordinator.
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
> at 
>

[jira] [Resolved] (IMPALA-13009) Potential leak of partition deletions in the catalog topic

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-13009.
-
Fix Version/s: Impala 4.5.0
   Resolution: Fixed

> Potential leak of partition deletions in the catalog topic
> --
>
> Key: IMPALA-13009
> URL: https://issues.apache.org/jira/browse/IMPALA-13009
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, 
> Impala 4.1.2, Impala 4.3.0
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.5.0
>
>
> Catalogd might not send partition deletions to the catalog topic in the 
> following scenario:
> * Some partitions of a table are dropped.
> * The HdfsTable object is removed sequentially before catalogd collects the 
> dropped partitions.
> In such case, catalogd loses track of the dropped partitions so their updates 
> keep existing in the catalog topic, until the partition names are reused 
> again.
> Note that the HdfsTable object can be removed by commands like DropTable or 
> INVALIDATE.
> The leaked partitions will be detected when a coordinator restarts. An 
> IllegalStateException complaining stale partitions will be reported, causing 
> the table not being added to the catalog cache of coordinator.
> {noformat}
> E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog 
> object: Received stale partition in a statestore update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> Java exception follows:
> java.lang.IllegalStateException: Received stale partition in a statestore 
> update: 
> THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL,
>  type:TColumnType(types:[TTypeNode(type:SCALAR, 
> scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, 
> int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], 
> location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, 
> file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 
> 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 
> 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 
> 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 
> 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 
> 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, 
> stats:TTableStats(num_rows:-1), is_marked_cached:false, 
> hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, 
> numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, 
> has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, 
> partition_name:p=106, 
> hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, 
> collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, 
> blockSize:0))
> at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:512)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523)
> at 
> org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334)
> at 
> org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262)
> at 
> org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120)
> at 
>

[jira] [Assigned] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13034:
---

Assignee: Quanlong Huang

> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12795) TestWebPage.test_catalog_operation_fields is flaky

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-12795.
-
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> TestWebPage.test_catalog_operation_fields is flaky
> --
>
> Key: IMPALA-12795
> URL: https://issues.apache.org/jira/browse/IMPALA-12795
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> Saw the test failed in an internal job:
> {noformat}
> webserver/test_web_pages.py:942: in test_catalog_operation_fields
> assert matched
> E   assert False{noformat}
> That means the CREATE DATABASE statement was not found in the coordinator 
> webUI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (IMPALA-12795) TestWebPage.test_catalog_operation_fields is flaky

2024-05-06 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-12795.
-
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

> TestWebPage.test_catalog_operation_fields is flaky
> --
>
> Key: IMPALA-12795
> URL: https://issues.apache.org/jira/browse/IMPALA-12795
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> Saw the test failed in an internal job:
> {noformat}
> webserver/test_web_pages.py:942: in test_catalog_operation_fields
> assert matched
> E   assert False{noformat}
> That means the CREATE DATABASE statement was not found in the coordinator 
> webUI.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI

2024-05-05 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843633#comment-17843633
 ] 

Quanlong Huang commented on IMPALA-13033:
-

We can make it more robust by handling the error of "liness.fail()", e.g. 
assigning "line" to "encoded_profile" directly.

> impala-profile-tool should support parsing thrift profiles downloaded from 
> WebUI
> 
>
> Key: IMPALA-13033
> URL: https://issues.apache.org/jira/browse/IMPALA-13033
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Anshula Jain
>Priority: Major
>  Labels: newbie, ramp-up
>
> In the coordinator WebUI, users can download query profiles in 
> text/json/thrift formats. The thrift profile is the same as one line in the 
> profile log without the timestamp and query id at the beginning.
> impala-profile-tool fails to parse such a file. It should retry parsing the 
> whole line as the encoded profile. Current code snipper:
> {code:cpp}
> // Parse out fields from the line.
> istringstream liness(line);
> int64_t timestamp;
> string query_id, encoded_profile;
> liness >> timestamp >> query_id >> encoded_profile;
> if (liness.fail()) {
>   cerr << "Error parsing line " << lineno << ": '" << line << "'\n";
>   ++errors;
>   continue;
> }{code}
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13044) Upgrade bouncycastle to 1.78

2024-05-05 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13044:

 Fix Version/s: Impala 4.5.0
Target Version:   (was: Impala 4.4.0)

> Upgrade bouncycastle to 1.78
> 
>
> Key: IMPALA-13044
> URL: https://issues.apache.org/jira/browse/IMPALA-13044
> Project: IMPALA
>  Issue Type: Task
>  Components: Frontend
>Affects Versions: Impala 4.3.0
>Reporter: Peter Rozsa
>Assignee: Peter Rozsa
>Priority: Major
> Fix For: Impala 4.5.0
>
>
> Impala uses boucycastle:1.68 which contains various CVEs. Upgrading to 1.78 
> resolves these security concerns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py

2024-04-30 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842316#comment-17842316
 ] 

Quanlong Huang commented on IMPALA-13047:
-

Uploaded a patch for review: https://gerrit.cloudera.org/c/21376/

> Support restarting a specified impalad in bin/start-impala-cluster.py
> -
>
> Key: IMPALA-13047
> URL: https://issues.apache.org/jira/browse/IMPALA-13047
> Project: IMPALA
>  Issue Type: New Feature
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
>
> Currently, bin/start-impala-cluster.py can restart catalogd, statestored and 
> *all* impalads. It'd be useful to support only restarting one impalad. We 
> need this in the debug of IMPALA-13009.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-04-30 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-12835.
-
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

Resolving this. Thank [~csringhofer] !

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Resolved] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled

2024-04-30 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang resolved IMPALA-12835.
-
Fix Version/s: Impala 4.4.0
   Resolution: Fixed

Resolving this. Thank [~csringhofer] !

> Transactional tables are unsynced when 
> hms_event_incremental_refresh_transactional_table is disabled
> 
>
> Key: IMPALA-12835
> URL: https://issues.apache.org/jira/browse/IMPALA-12835
> Project: IMPALA
>  Issue Type: Bug
>  Components: Catalog
>Reporter: Quanlong Huang
>Assignee: Csaba Ringhofer
>Priority: Critical
> Fix For: Impala 4.4.0
>
>
> There are some test failures when 
> hms_event_incremental_refresh_transactional_table is disabled:
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events
>  * 
> tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication
> I can reproduce the issue locally:
> {noformat}
> $ bin/start-impala-cluster.py 
> --catalogd_args=--hms_event_incremental_refresh_transactional_table=false
> impala-shell> create table txn_tbl (id int, val int) stored as parquet 
> tblproperties 
> ('transactional'='true','transactional_properties'='insert_only');
> impala-shell> describe txn_tbl;  -- make the table loaded in Impala
> hive> insert into txn_tbl values(101, 200);
> impala-shell> select * from txn_tbl; {noformat}
> Impala shows no results until a REFRESH runs on this table.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py

2024-04-29 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13047:
---

 Summary: Support restarting a specified impalad in 
bin/start-impala-cluster.py
 Key: IMPALA-13047
 URL: https://issues.apache.org/jira/browse/IMPALA-13047
 Project: IMPALA
  Issue Type: New Feature
  Components: Infrastructure
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Currently, bin/start-impala-cluster.py can restart catalogd, statestored and 
*all* impalads. It'd be useful to support only restarting one impalad. We need 
this in the debug of IMPALA-13009.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py

2024-04-29 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13047:
---

 Summary: Support restarting a specified impalad in 
bin/start-impala-cluster.py
 Key: IMPALA-13047
 URL: https://issues.apache.org/jira/browse/IMPALA-13047
 Project: IMPALA
  Issue Type: New Feature
  Components: Infrastructure
Reporter: Quanlong Huang
Assignee: Quanlong Huang


Currently, bin/start-impala-cluster.py can restart catalogd, statestored and 
*all* impalads. It'd be useful to support only restarting one impalad. We need 
this in the debug of IMPALA-13009.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (IMPALA-12917) Several tests in TestEventProcessingError fail

2024-04-29 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-12917:

Fix Version/s: Impala 4.4.0

> Several tests in TestEventProcessingError fail
> --
>
> Key: IMPALA-12917
> URL: https://issues.apache.org/jira/browse/IMPALA-12917
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Daniel Becker
>Assignee: Venugopal Reddy K
>Priority: Blocker
>  Labels: broken-build
> Fix For: Impala 4.4.0
>
>
> The failing tests are
> TestEventProcessingError.test_event_processor_error_alter_partition
> TestEventProcessingError.test_event_processor_error_alter_partitions
> TestEventProcessingError.test_event_processor_error_commit_compaction_event
> TestEventProcessingError.test_event_processor_error_commit_txn
> TestEventProcessingError.test_event_processor_error_stress_test
> Stacktrace:
> {code:java}
> E   Error: Error while compiling statement: FAILED: Execution Error, return 
> code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. 
> java.lang.NullPointerException
> E at org.apache.tez.client.TezClient.cleanStagingDir(TezClient.java:424)
> E at org.apache.tez.client.TezClient.start(TezClient.java:413)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:556)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:387)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:302)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.open(TezSessionPoolSession.java:106)
> E at 
> org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:468)
> E at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:227)
> E at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213)
> E at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105)
> E at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356)
> E at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329)
> E at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246)
> E at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107)
> E at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809)
> E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:546)
> E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:540)
> E at 
> org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340)
> E at java.security.AccessController.doPrivileged(Native Method)
> E at javax.security.auth.Subject.doAs(Subject.java:422)
> E at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> E at 
> org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360)
> E at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> E at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> E at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> E at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> E at java.lang.Thread.run(Thread.java:748) (state=08S01,code=1)
> {code}
> These tests were introduced by IMPALA-12832, [~VenuReddy] could you take a 
> look?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13041) Support Reading and Writing Puffin File Stats for Iceberg Tables

2024-04-28 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13041:

Labels: catalog-2024 impala-iceberg  (was: impala-iceberg)

> Support Reading and Writing Puffin File Stats for Iceberg Tables
> 
>
> Key: IMPALA-13041
> URL: https://issues.apache.org/jira/browse/IMPALA-13041
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Backend, Frontend
>Reporter: Manish Maheshwari
>Priority: Major
>  Labels: catalog-2024, impala-iceberg
>
> Puffin File format is an iceberg upstream spec to support stats for iceberg 
> tables. These stats cannot be both read and used for query planning and 
> written by Impala today. We want to extend support in Impala to do the below 
> - 
>  # Read stats from Puffin files 
>  # Write stats to puffin files during load/insert/update/delete commands (as 
> applicable)
>  # Modify compute stats command for iceberg tables to compute stats and store 
> them in Puffin files
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies

2024-04-28 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-10848:
---

Assignee: Quanlong Huang  (was: XiangYang)

OK, assigning this to myself.

> Provide compile-only option to skip downloading test dependencies
> -
>
> Key: IMPALA-10848
> URL: https://issues.apache.org/jira/browse/IMPALA-10848
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: Quanlong Huang
>Priority: Major
> Attachments: pywebhdfs_failure.png
>
>
> Compiling Impala is not easy for a beginner. A portion of failures are in 
> downloading/installing dependencies.
> For instance, old versions of Impala may fail to compile since cdh components 
> of old GBNs on S3 are removed. However, the artifacts of cdh component are 
> only used in testing (minicluster & holding testdata). We can still compile 
> without them.
> Take pip dependencies as another example, here is a failure I got from a 
> community user. It failed by installing pywebhdfs:
> !pywebhdfs_failure.png!
> However, simple git-grep shows that pywebhdfs is only used in tests:
> {code:bash}
> $ git grep pywebhdfs
> bin/bootstrap_system.sh:#  >>> from pywebhdfs.webhdfs import PyWebHdfsClient
> infra/python/deps/requirements.txt:pywebhdfs == 0.3.2
> tests/common/impala_test_suite.py:    #     HDFS: uses a mixture of pywebhdfs 
> (which is faster than the HDFS CLI) and the
> tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, 
> errors, _raise_pywebhdfs_exception
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text)
> tests/util/hdfs_util.py:      
> _raise_pywebhdfs_exception(response.status_code, response.text) {code}
> If the user just wants to compile Impala and deploys it in their existing 
> Hadoop cluster, dealing with these failures is a waste of their time.
> *Target for this JIRA*
>  * Provide compile-only option to bin/bootstrap_system.sh. It should skip 
> downloading/installing unused dependencies like postgresql.
>  * Provide compile-only option to buildall.sh. It should skip downloading 
> unused cdh/cdp components in compilation.
>  * Update our 
> [wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] 
> about this.
> Note that we already have some env vars to control the download behaviors, 
> e.g. SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the 
> compile-only scenario works with minimal requirements and document it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters

2024-04-26 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841255#comment-17841255
 ] 

Quanlong Huang commented on IMPALA-11499:
-

[~daniel.becker] found the root cause in the review: 
[https://gerrit.cloudera.org/c/21131/6/be/src/util/coding-util.cc#55]
The problem is in this string:
{code:cpp}
static function HiveShouldEscape = 
is_any_of("\"#%\\*/:=?\u00FF");{code}
"\u00FF" is the unicode of ÿ which is encoded into two bytes in UTF-8: 0xc3 
{*}0xbf{*}.
"运" is encoded into 3 bytes in UTF-8: 0xe8 *0xbf* 0x90. The second byte *0xbf* 
matches in the set so it's encoded as "%FFBF". The other bytes remain 
unchanged. That's the problem.

We can find more common Chinese characters that could hit this, e.g.
* 近: 0xe8 0xbf 0x91
* 返: 0xe8 0xbf 0x94
* 还: 0xe8 0xbf 0x98
* 这: 0xe8 0xbf 0x99
* 进: 0xe8 0xbf 0x9b
* 远: 0xe8 0xbf 0x9c

> Refactor UrlEncode function to handle special characters
> 
>
> Key: IMPALA-11499
> URL: https://issues.apache.org/jira/browse/IMPALA-11499
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Assignee: Pranav Yogi Lodha
>Priority: Critical
>
> Partition values are incorrectly URL-encoded in backend for unicode 
> characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong.
> To reproduce the issue, first create a partition table:
> {code:sql}
> create table my_part_tbl (id int) partitioned by (p string) stored as parquet;
> {code}
> Then insert data into it using partition values containing '运'. They will 
> fail:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='运营业务数据') values (0)
> Query submitted at: 2022-08-16 10:03:56 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq
> Error(5): Input/output error
> [localhost:21050] default> insert into my_part_tbl partition(p='运') values 
> (0);
> Query: insert into my_part_tbl partition(p='运') values (0)
> Query submitted at: 2022-08-16 10:04:22 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335
> ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op 
> (RENAME 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
>  TO 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq)
>  failed, error was: 
> hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq
> Error(5): Input/output error
> {noformat}
> However, partition value without the character '运' is OK:
> {noformat}
> [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') 
> values (0);
> Query: insert into my_part_tbl partition(p='营业务数据') values (0)
> Query submitted at: 2022-08-16 10:04:13 (Coordinator: 
> http://quanlong-OptiPlex-BJ:25000)
> Query progress can be monitored at: 
> http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036
> Modified 1 row(s) in 0.21s
> {noformat}
> Hive is able to execute all these statements.
> I'm able to narrow down the issue into Backend, where we URL-encode the 
> partition value in HdfsTableSink::InitOutputPartition():
> {code:cpp}
>   string value_str;
>   partition_key_expr_evals_[j]->PrintValue(value, _str);
>   // Directory names containing partition-key values need to be 
> UrlEncoded, in
>   // particular to avoid problems when '/' is part of the key value 
> (which might
>   // occur,

[jira] [Commented] (IMPALA-3192) Toolchain build should be able to use prebuilt artifacts

2024-04-26 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841087#comment-17841087
 ] 

Quanlong Huang commented on IMPALA-3192:


This will be helpful. When building the ORC lib only, I have to manually add a 
script for it like this
{code:bash}
# Exit on non-true return value
set -e
# Exit on reference to uninitialized variable
set -u
set -o pipefail

source ./init.sh
source ./init-compiler.sh

export LZ4_VERSION=1.9.3
export PROTOBUF_VERSION=3.14.0
export SNAPPY_VERSION=1.1.8
export ZLIB_VERSION=1.2.13
export ZSTD_VERSION=1.5.2
export GOOGLETEST_VERSION=1.8.0
$SOURCE_DIR/source/protobuf/build.sh
$SOURCE_DIR/source/zlib/build.sh
$SOURCE_DIR/source/googletest/build.sh
$SOURCE_DIR/source/snappy/build.sh
$SOURCE_DIR/source/lz4/build.sh
$SOURCE_DIR/source/zstd/build.sh

ORC_VERSION=1.7.9-p10 $SOURCE_DIR/source/orc/build.sh {code}
But it still builds the compiler and dependencies in the first run.

> Toolchain build should be able to use prebuilt artifacts
> 
>
> Key: IMPALA-3192
> URL: https://issues.apache.org/jira/browse/IMPALA-3192
> Project: IMPALA
>  Issue Type: Task
>  Components: Infrastructure
>Affects Versions: Impala 2.5.0
>Reporter: casey
>Priority: Minor
>
> The toolchain build should have an option (maybe the default) to only build 
> what isn't already available for download. Currently, if you want to build 
> the toolchain locally it builds everything. I think the most common use case 
> for a local build is when you want to add something. In that case, you don't 
> want to redo the work of building existing components, they can just be 
> downloaded.
> This would also help avoid issues like 
> https://issues.cloudera.org/browse/IMPALA-3191



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg

2024-04-26 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-12266:
---

Assignee: (was: Quanlong Huang)

> Sporadic failure after migrating a table to Iceberg
> ---
>
> Key: IMPALA-12266
> URL: https://issues.apache.org/jira/browse/IMPALA-12266
> Project: IMPALA
>  Issue Type: Bug
>  Components: fe
>Affects Versions: Impala 4.2.0
>Reporter: Tamas Mate
>Priority: Major
>  Labels: impala-iceberg
> Attachments: 
> catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, 
> impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1
>
>
> TestIcebergTable.test_convert_table test failed in a recent verify job's 
> dockerised tests:
> https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629
> {code:none}
> E   ImpalaBeeswaxException: ImpalaBeeswaxException:
> EINNER EXCEPTION: 
> EMESSAGE: AnalysisException: Failed to load metadata for table: 
> 'parquet_nopartitioned'
> E   CAUSED BY: TableLoadingException: Could not load table 
> test_convert_table_cdba7383.parquet_nopartitioned from catalog
> E   CAUSED BY: TException: 
> TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, 
> error_msgs:[NullPointerException: null]), lookup_status:OK)
> {code}
> {code:none}
> E0704 19:09:22.980131   833 JniUtil.java:183] 
> 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of 
> TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms
> I0704 19:09:22.980309   833 jni-util.cc:288] 
> 7145c21173f2c47b:2579db55] java.lang.NullPointerException
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513)
>   at 
> org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480)
>   at 
> org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397)
>   at 
> org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90)
>   at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89)
>   at 
> org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109)
>   at 
> org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238)
>   at 
> org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396)
> I0704 19:09:22.980324   833 status.cc:129] 7145c21173f2c47b:2579db55] 
> NullPointerException: null
> @  0x1012f9f  impala::Status::Status()
> @  0x187f964  impala::JniUtil::GetJniExceptionMsg()
> @   0xfee920  impala::JniCall::Call<>()
> @   0xfccd0f  impala::Catalog::GetPartialCatalogObject()
> @   0xfb55a5  
> impala::CatalogServiceThriftIf::GetPartialCatalogObject()
> @   0xf7a691  
> impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject()
> @   0xf82151  impala::CatalogServiceProcessorT<>::dispatchCall()
> @   0xee330f  apache::thrift::TDispatchProcessor::process()
> @  0x1329246  
> apache::thrift::server::TAcceptQueueServer::Task::run()
> @  0x1315a89  impala::ThriftThread::RunRunnable()
> @  0x131773d  
> boost::detail::function::void_function_obj_invoker0<>::invoke()
> @  0x195ba8c  impala::Thread::SuperviseThread()
> @  0x195c895  boost::detail::thread_data<>::run()
> @  0x23a03a7  thread_proxy
> @ 0x7faaad2a66ba  start_thread
> @ 0x7f2c151d  clone
> E0704 19:09:23.006968   833 catalog-server.cc:278] 
> 7145c21173f2c47b:2579db55] NullPointerException: null
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13037) EventsProcessorStressTest can hang

2024-04-24 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840619#comment-17840619
 ] 

Quanlong Huang commented on IMPALA-13037:
-

Also checked logs of hive-server2. There is an application id printed by the 
same thread:
{noformat}
2024-04-22T20:17:56,360  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezTask: Subscribed to counters: [] for queryId: 
jenkins_20240422201755_96876acb-ee10-409e-a6da-bd1a9b4bc6df
2024-04-22T20:17:56,360  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezTask: Tez session hasn't been created yet. Opening session
2024-04-22T20:17:56,360  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezSessionState: User of session id d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc is 
jenkins
2024-04-22T20:17:56,369  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.DagUtils: Localizing resource because it does not exist: 
file:/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/fe/target/dependency/postgresql-42.5.1.jar
 to dest: 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc-resources/postgresql-42.5.1.jar
2024-04-22T20:17:56,549  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.DagUtils: Resource modification time: 1713842276519 for 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc-resources/postgresql-42.5.1.jar
2024-04-22T20:17:56,625  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezSessionState: Created new resources: null
2024-04-22T20:17:56,627  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.DagUtils: Jar dir is null / directory doesn't exist. Choosing 
HIVE_INSTALL_DIR - /user/jenkins/.hiveJars
2024-04-22T20:17:57,105  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezSessionState: Computed sha: 
77f0dcaafc28cfe7b2d805cdf2d3a083370b2299011e98eb893bd9573e3d4c10 for file: 
file:/data0/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdp_components-45689292/apache-hive-3.1.3000.7.2.18.0-369-bin/lib/hive-exec-3.1.3000.7.2.18.0-369.jar
 of length: 74.73MB in 474 ms
2024-04-22T20:17:57,109  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.DagUtils: Resource modification time: 1713837749334 for 
hdfs://localhost:20500/user/jenkins/.hiveJars/hive-exec-3.1.3000.7.2.18.0-369-77f0dcaafc28cfe7b2d805cdf2d3a083370b2299011e98eb893bd9573e3d4c10.jar
2024-04-22T20:17:57,227  INFO [HiveServer2-Background-Pool: Thread-159] 
counters.Limits: Counter limits initialized with parameters:  
GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200
2024-04-22T20:17:57,227  INFO [HiveServer2-Background-Pool: Thread-159] 
counters.Limits: Counter limits initialized with parameters:  
GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=120
2024-04-22T20:17:57,227  INFO [HiveServer2-Background-Pool: Thread-159] 
client.TezClient: Tez Client Version: [ component=tez-api, 
version=0.9.1.7.2.18.0-369, revision=590a68b8a743783155fea2e6f2026f01a8775635, 
SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, 
buildTime=2023-09-28T12:31:39Z ]
2024-04-22T20:17:57,227  INFO [HiveServer2-Background-Pool: Thread-159] 
tez.TezSessionState: Opening new Tez Session (id: 
d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc, scratch dir: 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc)
2024-04-22T20:17:57,293  INFO [HiveServer2-Background-Pool: Thread-159] 
client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
2024-04-22T20:17:57,575  INFO [HiveServer2-Background-Pool: Thread-159] 
client.TezClient: Session mode. Starting session.
2024-04-22T20:17:57,664  INFO [HiveServer2-Background-Pool: Thread-159] 
client.TezClientUtils: Ignoring 'tez.lib.uris' since  'tez.ignore.lib.uris' is 
set to true
2024-04-22T20:17:57,675  INFO [HiveServer2-Background-Pool: Thread-159] 
client.TezClient: Tez system stage directory 
hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc/.tez/application_1713840366821_0001
 doesn't exist and is created
2024-04-22T20:17:57,699  INFO [HiveServer2-Background-Pool: Thread-159] 
conf.Configuration: resource-types.xml not found
2024-04-22T20:17:57,699  INFO [HiveServer2-Background-Pool: Thread-159] 
resource.ResourceUtils: Unable to find 'resource-types.xml'.
2024-04-22T20:17:57,704  INFO [HiveServer2-Background-Pool: Thread-159] 
common.TezYARNUtils: Ignoring 'tez.lib.uris' since  'tez.ignore.lib.uris' is 
set to true
2024-04-22T20:17:57,715  INFO [HiveServer2-Background-Pool: Thread-159] 
Configuration.deprecation: 
yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, 
use yarn.system-metrics-publisher.enabled
2024-04-22T20:17:58,223  INFO [HiveServer2-Background-Pool: Thread-159] 
impl.YarnClientImpl: Submitted application application_1713840366821_0001
2024-04-22T20:17:58,226

[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-04-24 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840598#comment-17840598
 ] 

Quanlong Huang commented on IMPALA-13034:
-

Yeah, IMPALA-9380 just helps the part of finalizing (unregistering) a query. 
The HTTP requests come when the query is still running. We need a lock to 
protect such read requests from concurrent modification on the profile. 
Probably we can add a more fine-grained lock just for reading/writing the 
profile and don't block client fetching query results.

> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Commented] (IMPALA-13028) libkudu_client.so is not stripped in the DEB/RPM packages

2024-04-24 Thread Quanlong Huang (Jira)



[ 
https://issues.apache.org/jira/browse/IMPALA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840597#comment-17840597
 ] 

Quanlong Huang commented on IMPALA-13028:
-

Probably because it has lots of unused dependency: IMPALA-12955.

I think it'd be nice to have it if we can reduce its size.

> libkudu_client.so is not stripped in the DEB/RPM packages
> -
>
> Key: IMPALA-13028
> URL: https://issues.apache.org/jira/browse/IMPALA-13028
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Assignee: XiangYang
>Priority: Major
>
> The current DEB package is 611M on ubuntu18.04. Here are the top-10 largest 
> files:
> {noformat}
> 14 MB 
> ./opt/impala/lib/jars/hive-standalone-metastore-3.1.3000.7.2.18.0-369.jar
> 15 MB ./opt/impala/lib/jars/kudu-client-e742f86f6d.jar
> 20 MB ./opt/impala/lib/native/libstdc++.so.6.0.28
> 22 MB ./opt/impala/lib/jars/js-22.3.0.jar
> 29 MB ./opt/impala/lib/jars/iceberg-hive-runtime-1.3.1.7.2.18.0-369.jar
> 60 MB ./opt/impala/lib/jars/ozone-filesystem-hadoop3-1.3.0.7.2.18.0-369.jar
> 84 MB ./opt/impala/util/impala-profile-tool
> 85 MB ./opt/impala/sbin/impalad
> 175 MB ./opt/impala/lib/jars/impala-minimal-s3a-aws-sdk-4.4.0-SNAPSHOT.jar
> 188 MB ./opt/impala/lib/native/libkudu_client.so.0.1.0{noformat}
> It appears that we just strip binaries built by Impala, e.g. impalad and 
> impala-profile-tool.
> libkudu_client.so.0.1.0 remains the same as the one in the toolchain folder.
> {code:bash}
> $ ll -th 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> -rw-r--r-- 1 quanlong quanlong 189M 10月 18  2023 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> $ file 
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0
> toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0:
>  ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, 
> with debug_info, not stripped{code}
> CC [~yx91490] [~boroknagyz] [~rizaon]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Created] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-04-24 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13034:
---

 Summary: Add logs for slow HTTP requests dumping the profile
 Key: IMPALA-13034
 URL: https://issues.apache.org/jira/browse/IMPALA-13034
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang


There are several endpoints in WebUI that can dump a query profile: 
/query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json

The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which 
acquires lock of the ClientRequestState. This could blocks client requests in 
fetching query results. We should add warning logs when such HTTP requests run 
slow (e.g. when the profile is too large to download in a short time).

Related codes:
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-04-24 Thread Quanlong Huang (Jira)

Quanlong Huang created IMPALA-13034:
---

 Summary: Add logs for slow HTTP requests dumping the profile
 Key: IMPALA-13034
 URL: https://issues.apache.org/jira/browse/IMPALA-13034
 Project: IMPALA
  Issue Type: Bug
  Components: Backend
Reporter: Quanlong Huang


There are several endpoints in WebUI that can dump a query profile: 
/query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json

The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which 
acquires lock of the ClientRequestState. This could blocks client requests in 
fetching query results. We should add warning logs when such HTTP requests run 
slow (e.g. when the profile is too large to download in a short time).

Related codes:
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile

2024-04-24 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13034:

Description: 
There are several endpoints in WebUI that can dump a query profile: 
/query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json

The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which 
acquires lock of the ClientRequestState. This could blocks client requests in 
fetching query results. We should add warning logs when such HTTP requests run 
slow (e.g. when the profile is too large to download in a short time). IP 
address and other info of such requests should also be logged.

Related codes:
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207

  was:
There are several endpoints in WebUI that can dump a query profile: 
/query_profile, /query_profile_encoded, /query_profile_plain_text, 
/query_profile_json

The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which 
acquires lock of the ClientRequestState. This could blocks client requests in 
fetching query results. We should add warning logs when such HTTP requests run 
slow (e.g. when the profile is too large to download in a short time).

Related codes:
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207


> Add logs for slow HTTP requests dumping the profile
> ---
>
> Key: IMPALA-13034
> URL: https://issues.apache.org/jira/browse/IMPALA-13034
> Project: IMPALA
>  Issue Type: Bug
>  Components: Backend
>Reporter: Quanlong Huang
>Priority: Critical
>
> There are several endpoints in WebUI that can dump a query profile: 
> /query_profile, /query_profile_encoded, /query_profile_plain_text, 
> /query_profile_json
> The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() 
> which acquires lock of the ClientRequestState. This could blocks client 
> requests in fetching query results. We should add warning logs when such HTTP 
> requests run slow (e.g. when the profile is too large to download in a short 
> time). IP address and other info of such requests should also be logged.
> Related codes:
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Updated] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI

2024-04-23 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang updated IMPALA-13033:

Labels: newbie ramp-up  (was: )

> impala-profile-tool should support parsing thrift profiles downloaded from 
> WebUI
> 
>
> Key: IMPALA-13033
> URL: https://issues.apache.org/jira/browse/IMPALA-13033
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Priority: Major
>  Labels: newbie, ramp-up
>
> In the coordinator WebUI, users can download query profiles in 
> text/json/thrift formats. The thrift profile is the same as one line in the 
> profile log without the timestamp and query id at the beginning.
> impala-profile-tool fails to parse such a file. It should retry parsing the 
> whole line as the encoded profile. Current code snipper:
> {code:cpp}
> // Parse out fields from the line.
> istringstream liness(line);
> int64_t timestamp;
> string query_id, encoded_profile;
> liness >> timestamp >> query_id >> encoded_profile;
> if (liness.fail()) {
>   cerr << "Error parsing line " << lineno << ": '" << line << "'\n";
>   ++errors;
>   continue;
> }{code}
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

[jira] [Assigned] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI

2024-04-23 Thread Quanlong Huang (Jira)



 [ 
https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Quanlong Huang reassigned IMPALA-13033:
---

Assignee: (was: Quanlong Huang)

> impala-profile-tool should support parsing thrift profiles downloaded from 
> WebUI
> 
>
> Key: IMPALA-13033
> URL: https://issues.apache.org/jira/browse/IMPALA-13033
> Project: IMPALA
>  Issue Type: Bug
>  Components: Infrastructure
>Reporter: Quanlong Huang
>Priority: Major
>
> In the coordinator WebUI, users can download query profiles in 
> text/json/thrift formats. The thrift profile is the same as one line in the 
> profile log without the timestamp and query id at the beginning.
> impala-profile-tool fails to parse such a file. It should retry parsing the 
> whole line as the encoded profile. Current code snipper:
> {code:cpp}
> // Parse out fields from the line.
> istringstream liness(line);
> int64_t timestamp;
> string query_id, encoded_profile;
> liness >> timestamp >> query_id >> encoded_profile;
> if (liness.fail()) {
>   cerr << "Error parsing line " << lineno << ": '" << line << "'\n";
>   ++errors;
>   continue;
> }{code}
> https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 4195 matches

Mail list logo