[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements
Quanlong Huang created IMPALA-13154: --- Summary: Some tables are missing in Top-N Tables with Highest Memory Requirements Key: IMPALA-13154 URL: https://issues.apache.org/jira/browse/IMPALA-13154 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with Highest Memory Requirements". However, not all tables are counted there. E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata loading on it. When it's done, the table is not shown in the WebUI. The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 'type' is ThriftObjectType.FULL: [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459] This used to be the place that all code paths using the table will go to. However, we've done bunch of optimizations to not getting the FULL thrift object of the table. We should move the code of updating the list of largest tables somewhere that all table usages can reach, e.g. after loading the metadata of the table, we can update its estimatedMetadataSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13154) Some tables are missing in Top-N Tables with Highest Memory Requirements
Quanlong Huang created IMPALA-13154: --- Summary: Some tables are missing in Top-N Tables with Highest Memory Requirements Key: IMPALA-13154 URL: https://issues.apache.org/jira/browse/IMPALA-13154 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang In the /catalog page of catalogd WebUI, there is a table for "Top-N Tables with Highest Memory Requirements". However, not all tables are counted there. E.g. after starting catalogd, run a DESCRIBE on a table to trigger metadata loading on it. When it's done, the table is not shown in the WebUI. The cause is that the list is only updated in HdfsTable.getTHdfsTable() when 'type' is ThriftObjectType.FULL: [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L2457-L2459] This used to be the place that all code paths using the table will go to. However, we've done bunch of optimizations to not getting the FULL thrift object of the table. We should move the code of updating the list of largest tables somewhere that all table usages can reach, e.g. after loading the metadata of the table, we can update its estimatedMetadataSize. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
[ https://issues.apache.org/jira/browse/IMPALA-13152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853924#comment-17853924 ] Quanlong Huang commented on IMPALA-13152: - Assiging this to [~rizaon] who knows more about this. > IllegalStateException in computing processing cost when there are predicates > on analytic output columns > --- > > Key: IMPALA-13152 > URL: https://issues.apache.org/jira/browse/IMPALA-13152 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Riza Suminto >Priority: Major > > Saw an error in the following query when is on: > {code:sql} > create table tbl (a int, b int, c int); > set COMPUTE_PROCESSING_COST=1; > explain select a, b from ( > select a, b, c, > row_number() over(partition by a order by b desc) as latest > from tbl > )b > WHERE latest=1 > ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! > {code} > Exception in the logs: > {noformat} > I0611 13:04:37.192874 28004 jni-util.cc:321] > 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: > Processing cost of PlanNode 01:TOP-N is invalid! > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) > at > org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) > at > org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} > Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
Quanlong Huang created IMPALA-13152: --- Summary: IllegalStateException in computing processing cost when there are predicates on analytic output columns Key: IMPALA-13152 URL: https://issues.apache.org/jira/browse/IMPALA-13152 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Assignee: Riza Suminto Saw an error in the following query when is on: {code:sql} create table tbl (a int, b int, c int); set COMPUTE_PROCESSING_COST=1; explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! {code} Exception in the logs: {noformat} I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) at org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) at org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13152) IllegalStateException in computing processing cost when there are predicates on analytic output columns
Quanlong Huang created IMPALA-13152: --- Summary: IllegalStateException in computing processing cost when there are predicates on analytic output columns Key: IMPALA-13152 URL: https://issues.apache.org/jira/browse/IMPALA-13152 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Assignee: Riza Suminto Saw an error in the following query when is on: {code:sql} create table tbl (a int, b int, c int); set COMPUTE_PROCESSING_COST=1; explain select a, b from ( select a, b, c, row_number() over(partition by a order by b desc) as latest from tbl )b WHERE latest=1 ERROR: IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! {code} Exception in the logs: {noformat} I0611 13:04:37.192874 28004 jni-util.cc:321] 264ee79bfb6ac031:42f8006c] java.lang.IllegalStateException: Processing cost of PlanNode 01:TOP-N is invalid! at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.planner.PlanNode.computeRowConsumptionAndProductionToCost(PlanNode.java:1047) at org.apache.impala.planner.PlanFragment.computeCostingSegment(PlanFragment.java:287) at org.apache.impala.planner.Planner.computeProcessingCost(Planner.java:560) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1932) at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:2892) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2676) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175){noformat} Don't see the error if removing the predicate "latest=1". -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13093) Insert into Huawei OBS table failed
[ https://issues.apache.org/jira/browse/IMPALA-13093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17853843#comment-17853843 ] Quanlong Huang commented on IMPALA-13093: - It seems adding this to hdfs-site.xml can also fix the issue: {code:xml} fs.obs.file.visibility.enable true {code} I'll check whether OBS returns the real block size. CC [~michaelsmith] [~eyizoha] > Insert into Huawei OBS table failed > --- > > Key: IMPALA-13093 > URL: https://issues.apache.org/jira/browse/IMPALA-13093 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Insert into a table using Huawei OBS (Object Storage Service) as the storage > will failed by the following error: > {noformat} > Query: insert into test_obs1 values (1, 'abc') > ERROR: Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory {noformat} > Looking into the logs: > {noformat} > I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] > Failed to get info on temporary HDFS file: > obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt > Error(2): No such file or directory > @ 0xfc6d44 impala::Status::Status() > @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() > @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() > @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() > @ 0x1c46569 impala::HdfsTableSink::Send() > @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() > @ 0x14efca3 impala::FragmentInstanceState::Exec() > @ 0x148dc4c impala::QueryState::ExecFInstance() > @ 0x1b3bab9 impala::Thread::SuperviseThread() > @ 0x1b3cdb1 boost::detail::thread_data<>::run() > @ 0x2474a87 thread_proxy > @ 0x7fe5a562dea5 start_thread > @ 0x7fe5a25ddb0d __clone{noformat} > Note that impalad is started with {{--symbolize_stacktrace=true}} so the > stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI
Quanlong Huang created IMPALA-13149: --- Summary: Show JVM info in the WebUI Key: IMPALA-13149 URL: https://issues.apache.org/jira/browse/IMPALA-13149 Project: IMPALA Issue Type: New Feature Reporter: Quanlong Huang It'd be helpful to show the JVM info in the WebUI, e.g. show the output of "java -version": {code:java} openjdk version "1.8.0_412" OpenJDK Runtime Environment (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code} On nodes just have JRE deployed, we'd like to deploy the same version of JDK to perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13149) Show JVM info in the WebUI
Quanlong Huang created IMPALA-13149: --- Summary: Show JVM info in the WebUI Key: IMPALA-13149 URL: https://issues.apache.org/jira/browse/IMPALA-13149 Project: IMPALA Issue Type: New Feature Reporter: Quanlong Huang It'd be helpful to show the JVM info in the WebUI, e.g. show the output of "java -version": {code:java} openjdk version "1.8.0_412" OpenJDK Runtime Environment (build 1.8.0_412-b08) OpenJDK 64-Bit Server VM (build 25.412-b08, mixed mode){code} On nodes just have JRE deployed, we'd like to deploy the same version of JDK to perform heap dumps (jmap). Showing the JVM info in the WebUI will be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13148) Show the number of in-progress Catalog operations
[ https://issues.apache.org/jira/browse/IMPALA-13148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13148: Attachment: Selection_123.png Selection_122.png > Show the number of in-progress Catalog operations > - > > Key: IMPALA-13148 > URL: https://issues.apache.org/jira/browse/IMPALA-13148 > Project: IMPALA > Issue Type: Improvement >Reporter: Quanlong Huang >Priority: Major > Labels: newbie, ramp-up > Attachments: Selection_122.png, Selection_123.png > > > In the /operations page of catalogd WebUI, the list of In-progress Catalog > Operations are shown. It'd be helpful to also show the number of such > operations. Like in the /queries page of coordinator WebUI, it shows 100 > queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations
Quanlong Huang created IMPALA-13148: --- Summary: Show the number of in-progress Catalog operations Key: IMPALA-13148 URL: https://issues.apache.org/jira/browse/IMPALA-13148 Project: IMPALA Issue Type: Improvement Reporter: Quanlong Huang Attachments: Selection_122.png, Selection_123.png In the /operations page of catalogd WebUI, the list of In-progress Catalog Operations are shown. It'd be helpful to also show the number of such operations. Like in the /queries page of coordinator WebUI, it shows 100 queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13148) Show the number of in-progress Catalog operations
Quanlong Huang created IMPALA-13148: --- Summary: Show the number of in-progress Catalog operations Key: IMPALA-13148 URL: https://issues.apache.org/jira/browse/IMPALA-13148 Project: IMPALA Issue Type: Improvement Reporter: Quanlong Huang Attachments: Selection_122.png, Selection_123.png In the /operations page of catalogd WebUI, the list of In-progress Catalog Operations are shown. It'd be helpful to also show the number of such operations. Like in the /queries page of coordinator WebUI, it shows 100 queries in flight. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock
Quanlong Huang created IMPALA-13126: --- Summary: ReloadEvent.isOlderEvent() should hold the table read lock Key: IMPALA-13126 URL: https://issues.apache.org/jira/browse/IMPALA-13126 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Sai Hemanth Gantasala Saw an exception like this: {noformat} E0601 09:11:25.275251 246 MetastoreEventsProcessor.java:990] Unexpected exception received while processing event Java exception follows: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$ValueIterator.next(HashMap.java:1498) at org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616) at org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489) at org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {noformat} For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check whether the corresponding partition is reloaded after the event. This should be done after holding the table read lock. Otherwise, EventProcessor could hit the error above when there are concurrent DDLs/DMLs modifying the partition list. CC [~VenuReddy] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13126) ReloadEvent.isOlderEvent() should hold the table read lock
Quanlong Huang created IMPALA-13126: --- Summary: ReloadEvent.isOlderEvent() should hold the table read lock Key: IMPALA-13126 URL: https://issues.apache.org/jira/browse/IMPALA-13126 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Sai Hemanth Gantasala Saw an exception like this: {noformat} E0601 09:11:25.275251 246 MetastoreEventsProcessor.java:990] Unexpected exception received while processing event Java exception follows: java.util.ConcurrentModificationException at java.util.HashMap$HashIterator.nextNode(HashMap.java:1469) at java.util.HashMap$ValueIterator.next(HashMap.java:1498) at org.apache.impala.catalog.FeFsTable$Utils.getPartitionFromThriftPartitionSpec(FeFsTable.java:616) at org.apache.impala.catalog.HdfsTable.getPartitionFromThriftPartitionSpec(HdfsTable.java:597) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:511) at org.apache.impala.catalog.Catalog.getHdfsPartition(Catalog.java:489) at org.apache.impala.catalog.CatalogServiceCatalog.isPartitionLoadedAfterEvent(CatalogServiceCatalog.java:4024) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.isOlderEvent(MetastoreEvents.java:2754) at org.apache.impala.catalog.events.MetastoreEvents$ReloadEvent.processTableEvent(MetastoreEvents.java:2729) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreTableEvent.process(MetastoreEvents.java:1107) at org.apache.impala.catalog.events.MetastoreEvents$MetastoreEvent.processIfEnabled(MetastoreEvents.java:531) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:1164) at org.apache.impala.catalog.events.MetastoreEventsProcessor.processEvents(MetastoreEventsProcessor.java:972) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) {noformat} For a partition-level RELOAD event, ReloadEvent.isOlderEvent() needs to check whether the corresponding partition is reloaded after the event. This should be done after holding the table read lock. Otherwise, EventProcessor could hit the error above when there are concurrent DDLs/DMLs modifying the partition list. CC [~VenuReddy] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13122) Show file stats in table loading logs
Quanlong Huang created IMPALA-13122: --- Summary: Show file stats in table loading logs Key: IMPALA-13122 URL: https://issues.apache.org/jira/browse/IMPALA-13122 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Here is an example for table loading logs on a table: {noformat} I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table definition and all partition(s) of tpcds.store_sales (needed by coordinator) I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. Actual columns: 23 I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. Time taken: 26.699us I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata from the Metastore: tpcds.store_sales I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions for: tpcds.store_sales using partition batch size: 1000 I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 partitions for table tpcds.store_sales I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 partitions for table tpcds.store_sales I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata from the Metastore: tpcds.store_sales I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file and block metadata for 1824 paths for table tpcds.store_sales using a thread pool of size 5 I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816, ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 569.107ms I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: tpcds.store_sales set to: -1 I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: tpcds.store_sales (4026ms){noformat} >From the logs, we know the table has 23 columns and 1824 partitions. Time >spent in loading the table schema and file metadata are also shown. However, it's unknown whether there are small files issue under the partitions. The underlying storage could also be slow (e.g. S3) which results in a long time in loading file metadata. It'd be helpful to add these in the logs: * number of files loaded * min/avg/max of file sizes * total file size * number of files * number of blocks (HDFS only) * number of hosts, disks (HDFS/Ozone only) * Stats of accessTime and lastModifiedTime These can be aggregated in FileMetadataLoader#loadInternal() and logged in ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions(). [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177] [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172] [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13122) Show file stats in table loading logs
Quanlong Huang created IMPALA-13122: --- Summary: Show file stats in table loading logs Key: IMPALA-13122 URL: https://issues.apache.org/jira/browse/IMPALA-13122 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Here is an example for table loading logs on a table: {noformat} I0603 08:46:05.67 24417 HdfsTable.java:1255] Loading metadata for table definition and all partition(s) of tpcds.store_sales (needed by coordinator) I0603 08:46:05.642702 24417 HdfsTable.java:1896] Loaded 23 columns from HMS. Actual columns: 23 I0603 08:46:05.767457 24417 HdfsTable.java:3114] Load Valid Write Id List Done. Time taken: 26.699us I0603 08:46:05.767549 24417 HdfsTable.java:1297] Fetching partition metadata from the Metastore: tpcds.store_sales I0603 08:46:05.806337 24417 MetaStoreUtil.java:190] Fetching 1824 partitions for: tpcds.store_sales using partition batch size: 1000 I0603 08:46:07.336064 24417 MetaStoreUtil.java:208] Fetched 1000/1824 partitions for table tpcds.store_sales I0603 08:46:07.915474 24417 MetaStoreUtil.java:208] Fetched 1824/1824 partitions for table tpcds.store_sales I0603 08:46:07.915519 24417 HdfsTable.java:1304] Fetched partition metadata from the Metastore: tpcds.store_sales I0603 08:46:08.840034 24417 ParallelFileMetadataLoader.java:224] Loading file and block metadata for 1824 paths for table tpcds.store_sales using a thread pool of size 5 I0603 08:46:09.383904 24417 HdfsTable.java:836] Loaded file and block metadata for tpcds.store_sales partitions: ss_sold_date_sk=2450816, ss_sold_date_sk=2450817, ss_sold_date_sk=2450818, and 1821 others. Time taken: 569.107ms I0603 08:46:09.420702 24417 Table.java:1117] last refreshed event id for table: tpcds.store_sales set to: -1 I0603 08:46:09.420794 24417 TableLoader.java:177] Loaded metadata for: tpcds.store_sales (4026ms){noformat} >From the logs, we know the table has 23 columns and 1824 partitions. Time >spent in loading the table schema and file metadata are also shown. However, it's unknown whether there are small files issue under the partitions. The underlying storage could also be slow (e.g. S3) which results in a long time in loading file metadata. It'd be helpful to add these in the logs: * number of files loaded * min/avg/max of file sizes * total file size * number of files * number of blocks (HDFS only) * number of hosts, disks (HDFS/Ozone only) * Stats of accessTime and lastModifiedTime These can be aggregated in FileMetadataLoader#loadInternal() and logged in ParallelFileMetadataLoader#load() or HdfsTable#loadFileMetadataForPartitions(). [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/FileMetadataLoader.java#L177] [https://github.com/apache/impala/blob/9011b81afa33ef7e4b0ec8a367b2713be8917213/fe/src/main/java/org/apache/impala/catalog/ParallelFileMetadataLoader.java#L172] [https://github.com/apache/impala/blob/ee21427d26620b40d38c706b4944d2831f84f6f5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java#L836] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
Quanlong Huang created IMPALA-13117: --- Summary: Improve the heap usage during metadata loading and DDL/DML executions Key: IMPALA-13117 URL: https://issues.apache.org/jira/browse/IMPALA-13117 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang The JVM heap size of catalogd is not just used by the metadata cache. The in-progress metadata loading threads and DDL/DML executions also creates temp objects, which introduces spikes in the heap usage. We should improve the heap usage in this part, especially when the metadata loading is slow due to external slowness (e.g. listing files on S3). CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
[ https://issues.apache.org/jira/browse/IMPALA-13116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13116: --- Assignee: Quanlong Huang > In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if > the table is invalidated > --- > > Key: IMPALA-13116 > URL: https://issues.apache.org/jira/browse/IMPALA-13116 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > A table can be invalidated when there are DDL/DML/REFRESHs running in flight: > * User can explictly trigger an INVALIDATE METADATA command > * The table could be invalidated by CatalogdTableInvalidator when > invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned > on > Note that invalidating a table doesn't require holding the lock of the > HdfsTable object so it can finish even if there are on-going updates on the > table. > The updated HdfsTable object won't be added to the metadata cache since it > has been replaced with an IncompleteTable object. It's only used in the > DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal > representation which is mostly the table name and catalog version. We don't > need the updates on the HdfsTable object to be finished. Thus, we can > consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13117) Improve the heap usage during metadata loading and DDL/DML executions
Quanlong Huang created IMPALA-13117: --- Summary: Improve the heap usage during metadata loading and DDL/DML executions Key: IMPALA-13117 URL: https://issues.apache.org/jira/browse/IMPALA-13117 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang The JVM heap size of catalogd is not just used by the metadata cache. The in-progress metadata loading threads and DDL/DML executions also creates temp objects, which introduces spikes in the heap usage. We should improve the heap usage in this part, especially when the metadata loading is slow due to external slowness (e.g. listing files on S3). CC [~mylogi...@gmail.com] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
Quanlong Huang created IMPALA-13116: --- Summary: In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated Key: IMPALA-13116 URL: https://issues.apache.org/jira/browse/IMPALA-13116 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang A table can be invalidated when there are DDL/DML/REFRESHs running in flight: * User can explictly trigger an INVALIDATE METADATA command * The table could be invalidated by CatalogdTableInvalidator when invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on Note that invalidating a table doesn't require holding the lock of the HdfsTable object so it can finish even if there are on-going updates on the table. The updated HdfsTable object won't be added to the metadata cache since it has been replaced with an IncompleteTable object. It's only used in the DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal representation which is mostly the table name and catalog version. We don't need the updates on the HdfsTable object to be finished. Thus, we can consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13116) In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated
Quanlong Huang created IMPALA-13116: --- Summary: In local-catalog mode, abort REFRESH and metadata reloading of DDL/DMLs if the table is invalidated Key: IMPALA-13116 URL: https://issues.apache.org/jira/browse/IMPALA-13116 Project: IMPALA Issue Type: Improvement Components: Catalog Reporter: Quanlong Huang A table can be invalidated when there are DDL/DML/REFRESHs running in flight: * User can explictly trigger an INVALIDATE METADATA command * The table could be invalidated by CatalogdTableInvalidator when invalidate_tables_on_memory_pressure or invalidate_tables_timeout_s is turned on Note that invalidating a table doesn't require holding the lock of the HdfsTable object so it can finish even if there are on-going updates on the table. The updated HdfsTable object won't be added to the metadata cache since it has been replaced with an IncompleteTable object. It's only used in the DDL/DML/REFRESH responses. In local catalog mode, the response is the minimal representation which is mostly the table name and catalog version. We don't need the updates on the HdfsTable object to be finished. Thus, we can consider aborting the reloading of such DDL/DML/REFRESH requests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients
Quanlong Huang created IMPALA-13115: --- Summary: Always add the query id in the error message to clients Key: IMPALA-13115 URL: https://issues.apache.org/jira/browse/IMPALA-13115 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang We have some errors like "Failed due to unreachable impalad(s)". We should improve them to mention the query id, e.g. "Query ${query_id} failed due to unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in the /queries page. Coordinator logs are also flushed out quickly. It's hard to find the query id there. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13115) Always add the query id in the error message to clients
Quanlong Huang created IMPALA-13115: --- Summary: Always add the query id in the error message to clients Key: IMPALA-13115 URL: https://issues.apache.org/jira/browse/IMPALA-13115 Project: IMPALA Issue Type: Improvement Components: Backend Reporter: Quanlong Huang We have some errors like "Failed due to unreachable impalad(s)". We should improve them to mention the query id, e.g. "Query ${query_id} failed due to unreachable impalad(s)". In a busy cluster, queries are flushed out quickly in the /queries page. Coordinator logs are also flushed out quickly. It's hard to find the query id there. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (IMPALA-12834) Add query load information to the query profile
[ https://issues.apache.org/jira/browse/IMPALA-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-12834: --- Assignee: YifanZhang > Add query load information to the query profile > --- > > Key: IMPALA-12834 > URL: https://issues.apache.org/jira/browse/IMPALA-12834 > Project: IMPALA > Issue Type: Improvement > Components: Perf Investigation >Reporter: YifanZhang >Assignee: YifanZhang >Priority: Minor > Fix For: Impala 4.4.0 > > > Add query load information to the query profile to track if the performance > regression is related to the insufficient resources of the node, and also > recommend if the current pool configurations or host configurations are > optimal. > The load information should include: > * Number of running queries of the executor group on which the query is > scheduled > * Number of running fragment instances of the hosts on which the query is > scheduled > * Used/Reserved memory of the hosts on which the query is scheduled > * Some other useful metrics -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12182) Add CPU utilization time series graph for RuntimeProfile's sampled values
[ https://issues.apache.org/jira/browse/IMPALA-12182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12182: Fix Version/s: Impala 4.3.0 > Add CPU utilization time series graph for RuntimeProfile's sampled values > - > > Key: IMPALA-12182 > URL: https://issues.apache.org/jira/browse/IMPALA-12182 > Project: IMPALA > Issue Type: New Feature >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.3.0 > > Attachments: 23-07-10_T15_33_44.png, 23-07-10_T15_36_26.png, > 23-07-10_T15_39_01.png, 23-07-10_T15_39_31.png, 23-07-10_T15_40_42.png, > 23-07-10_T15_40_50.png, 23-07-10_T15_40_55.png, cpu_utilization.png, > cpu_utilization_test-1.png, cpu_utilization_test-2.png, query_timeline.mkv, > simplescreenrecorder-2023-07-10_21.10.58.mkv, > simplescreenrecorder-2023-07-10_22.10.18.mkv, three_nodes.png, > three_nodes_zoomed_out.png, timeseries_cpu_utilization_line_plot.mkv, > two_nodes.png > > > The RuntimeProfile contains samples of CPU utilization metrics for user, sys > and iowait clamped to 64 values (retrieved from the ChunkedTimeSeriesCounter, > but sampled similar to SamplingTimeSeriesCounter). > It would be helpful to see the recent aggregate CPU node utilization samples > for each of the different nodes. > These are sampled every `periodic_counter_update_period_ms`. > AggregatedRuntimeProfile used in the Thrift profile contains the complete > series of values from the ChunkedTimeSeriesCounter samples. But, as this > representation is difficult to provide in the JSON, they have been > downsampled to 64 values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12364) Display disk and network metrics in webUI's query timeline
[ https://issues.apache.org/jira/browse/IMPALA-12364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12364: Fix Version/s: Impala 4.4.0 > Display disk and network metrics in webUI's query timeline > -- > > Key: IMPALA-12364 > URL: https://issues.apache.org/jira/browse/IMPALA-12364 > Project: IMPALA > Issue Type: Improvement >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: average_disk_network_metrics.mkv, > averaged_disk_network_metrics.png, both_charts_resize.mkv, > both_charts_resize.png, close_cpu_utilization_button.mkv, > draggable_resize_handle.png, hor_zoom_buttons.png, > horizontal_zoom_buttons.mkv, host_utilization_chart_resize.mkv, > host_utilization_close_button.png, host_utilization_resize_bar.png, > multiple_fragment_metrics.png, resize_drag_handle.mkv > > > It would be helpful to display disk and network usage in human readable form > on the query timeline, aligning it along with the CPU utilization plot, below > the fragment timing diagram. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-11915) Support timeline and graphical plan exports in the webUI
[ https://issues.apache.org/jira/browse/IMPALA-11915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-11915: Fix Version/s: Impala 4.3.0 > Support timeline and graphical plan exports in the webUI > > > Key: IMPALA-11915 > URL: https://issues.apache.org/jira/browse/IMPALA-11915 > Project: IMPALA > Issue Type: New Feature >Reporter: Quanlong Huang >Assignee: Surya Hebbar >Priority: Major > Labels: supportability > Fix For: Impala 4.3.0 > > Attachments: export_button.png, export_modal.png, > export_plan_example_70b4ecc5f6aec963e_85221a3b_plan.html, > export_timeline_example_0b4ecc5f6aec963e_85221a3b_timeline.svg, > exported_plan.png, exported_timeline.png, plan_download.png, > plan_download_button.png, plan_export.png, plan_export_modal.png, > plan_export_text_selection.png, svg_wrapped_export.html, text_selection.png, > timeline_download-1.png, timeline_download.png, timeline_download_button.png, > timeline_export.png, timeline_export_modal.png, > timeline_export_text_selection-1.png, timeline_export_text_selection.png > > > The graphical plan in the web UI is useful. It'd be nice to provide a button > to download the svg picture. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-12178) Refined alignment of timeticks in the webUI timeline
[ https://issues.apache.org/jira/browse/IMPALA-12178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12178: Fix Version/s: Impala 4.3.0 > Refined alignment of timeticks in the webUI timeline > > > Key: IMPALA-12178 > URL: https://issues.apache.org/jira/browse/IMPALA-12178 > Project: IMPALA > Issue Type: Improvement >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Minor > Fix For: Impala 4.3.0 > > Attachments: overflowed_timetick_label.png, timetick_label_fixed.png > > > The timeticks on the query timeline page in the WebUI were partially being > hidden due to the overflow for long timestamps after SVG rendering. > It would be better if the entire timtick label is displayed appropriately. > !overflowed_timetick_label.png|width=808,height=259! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13102. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Resolved] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13102. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Commented] (IMPALA-12190) Renaming table will cause losing privileges for non-admin users
[ https://issues.apache.org/jira/browse/IMPALA-12190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848508#comment-17848508 ] Quanlong Huang commented on IMPALA-12190: - Column masking and row filtering policies will also be messed up by RENAME. I think tag based policy will also be messed up if data lineages are not updated accordingly. +1 for a new Ranger API that returns all policies matching a given table (and optionally for a given user). We also need this to improve IMPALA-11501 to avoid loading the table schema from HMS. Currently, to check whether a user has a corresponding column masking policy on a table, we have to load the table to get all the column names and check whether there are policies on each column, which is inefficient. > Renaming table will cause losing privileges for non-admin users > --- > > Key: IMPALA-12190 > URL: https://issues.apache.org/jira/browse/IMPALA-12190 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Gabor Kaszab >Assignee: Sai Hemanth Gantasala >Priority: Critical > Labels: alter-table, authorization, ranger > > Let's say user 'a' gets some privileges on table 't'. When this table gets > renamed (even by user 'a') then user 'a' loses its privileges on that table. > > Repro steps: > # Start impala with Ranger > # start impala-shell as admin (-u admin) > # create table tmp (i int, s string) stored as parquet; > # grant all on table tmp to user ; > # grant all on table tmp to user ; > {code:java} > Query: show grant user on table tmp > +++--+---++-+--+-+-+---+--+-+ > | principal_type | principal_name | database | table | column | uri | > storage_type | storage_uri | udf | privilege | grant_option | create_time | > +++--+---++-+--+-+-+---+--+-+ > | USER | | default | tmp | * | | > | | | all | false | NULL | > +++--+---++-+--+-+-+---+--+-+ > Fetched 1 row(s) in 0.01s {code} > # alter table tmp rename to tmp_1234; > # show grant user on table tmp_1234; > {code:java} > Query: show grant user on table tmp_1234 > Fetched 0 row(s) in 0.17s{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan
[ https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13074: Labels: ramp-up (was: ) > WRITE TO HDFS node is omitted from Web UI graphic plan > -- > > Key: IMPALA-13074 > URL: https://issues.apache.org/jira/browse/IMPALA-13074 > Project: IMPALA > Issue Type: Bug >Reporter: Noemi Pap-Takacs >Priority: Major > Labels: ramp-up > > The query plan shows the nodes that take part in the execution, forming a > tree structure. > It can be displayed in the CLI by issuing the EXPLAIN command. When > the actual query is executed, the plan tree can also be viewed in the Impala > Web UI in a graphic form. > However, the explain string and the graphic plan tree does not match: the top > node is missing from the Web UI. > This is especially confusing in case of DDL and DML statements, where the > Data Sink is not displayed. This makes a SELECT * FROM table > indistinguishable from a CREATE TABLE, since both only displays the SCAN node > and omit the WRITE_TO_HDFS and SELECT node. > It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13074) WRITE TO HDFS node is omitted from Web UI graphic plan
[ https://issues.apache.org/jira/browse/IMPALA-13074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848422#comment-17848422 ] Quanlong Huang commented on IMPALA-13074: - Names like "HDFS WRITER", "KUDU WRITER" will be consistent with the ExecSummary. > WRITE TO HDFS node is omitted from Web UI graphic plan > -- > > Key: IMPALA-13074 > URL: https://issues.apache.org/jira/browse/IMPALA-13074 > Project: IMPALA > Issue Type: Bug >Reporter: Noemi Pap-Takacs >Priority: Major > > The query plan shows the nodes that take part in the execution, forming a > tree structure. > It can be displayed in the CLI by issuing the EXPLAIN command. When > the actual query is executed, the plan tree can also be viewed in the Impala > Web UI in a graphic form. > However, the explain string and the graphic plan tree does not match: the top > node is missing from the Web UI. > This is especially confusing in case of DDL and DML statements, where the > Data Sink is not displayed. This makes a SELECT * FROM table > indistinguishable from a CREATE TABLE, since both only displays the SCAN node > and omit the WRITE_TO_HDFS and SELECT node. > It would make sense to include the WRITE_TO_HDFS node in DML/DDL plans. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848395#comment-17848395 ] Quanlong Huang commented on IMPALA-13102: - Uploaded a patch for review: https://gerrit.cloudera.org/c/21445/ > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) > at > org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) > at > org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) > at .: > org.apache.impala.catalog.TableLoadingException: Failed to load metadata for > table: default.alltypes_bak > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) > at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) > at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) > at > org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:750) > Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) > at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) > at org.apache.impala.catalog.Column.updateStats(Column.java:73) > at > org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) > at
[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported
Quanlong Huang created IMPALA-13103: --- Summary: Corrupt column stats are not reported Key: IMPALA-13103 URL: https://issues.apache.org/jira/browse/IMPALA-13103 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Impala will report corrupt table stats in the query plan. However, corrupt column stats are not reported. For instance, consider the following table: {code:sql} create table t1 (id int, name string); insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code} with the following stats: {code:sql} alter table t1 set tblproperties('numRows'='4'); alter table t1 set column stats name ('numNulls'='0');{code} Note that column "id" has missing stats and column "name" has missing/corrupt stats (ndv=-1, numNulls=0). Grouping by "id" will report the missing stats: {code:sql} explain select id, count(*) from t1 group by id; WARNING: The following tables are missing relevant table and/or column statistics. default.t1{code} However, grouping by "name" doesn't report the missing/corrupt stats: {noformat} explain select name, count(*) from t1 group by name; +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=38.00MB Threads=2 | | Per-Host Resource Estimates: Memory=144MB | | Codegen disabled by planner | | Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB thread-reservation=2 | | PLAN-ROOT SINK | | | output exprs: name, count(*) | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0| | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*) | | | group by: name | | | mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=20B cardinality=4 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [default.t1] | |HDFS partitions=1/1 files=1 size=24B | |stored statistics: | | table: rows=4 size=unavailable | | columns: all | |extrapolated-rows=disabled max-scan-range-rows=4 | |mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 | |tuple-ids=0 row-size=12B cardinality=4 | |in pipelines: 00(GETNEXT) | +---+ {noformat} CC [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13103) Corrupt column stats are not reported
Quanlong Huang created IMPALA-13103: --- Summary: Corrupt column stats are not reported Key: IMPALA-13103 URL: https://issues.apache.org/jira/browse/IMPALA-13103 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Impala will report corrupt table stats in the query plan. However, corrupt column stats are not reported. For instance, consider the following table: {code:sql} create table t1 (id int, name string); insert into t1 values (1, 'aaa'), (2, 'aaa'), (3, 'aaa'), (4, 'aaa');{code} with the following stats: {code:sql} alter table t1 set tblproperties('numRows'='4'); alter table t1 set column stats name ('numNulls'='0');{code} Note that column "id" has missing stats and column "name" has missing/corrupt stats (ndv=-1, numNulls=0). Grouping by "id" will report the missing stats: {code:sql} explain select id, count(*) from t1 group by id; WARNING: The following tables are missing relevant table and/or column statistics. default.t1{code} However, grouping by "name" doesn't report the missing/corrupt stats: {noformat} explain select name, count(*) from t1 group by name; +---+ | Explain String | +---+ | Max Per-Host Resource Reservation: Memory=38.00MB Threads=2 | | Per-Host Resource Estimates: Memory=144MB | | Codegen disabled by planner | | Analyzed query: SELECT name, count(*) FROM `default`.t1 GROUP BY name | | | | F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1 | | | Per-Host Resources: mem-estimate=144.00MB mem-reservation=38.00MB thread-reservation=2 | | PLAN-ROOT SINK | | | output exprs: name, count(*) | | | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0| | | | | 01:AGGREGATE [FINALIZE] | | | output: count(*) | | | group by: name | | | mem-estimate=128.00MB mem-reservation=34.00MB spill-buffer=2.00MB thread-reservation=0 | | | tuple-ids=1 row-size=20B cardinality=4 | | | in pipelines: 01(GETNEXT), 00(OPEN) | | | | | 00:SCAN HDFS [default.t1] | |HDFS partitions=1/1 files=1 size=24B | |stored statistics: | | table: rows=4 size=unavailable | | columns: all | |extrapolated-rows=disabled max-scan-range-rows=4 | |mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1 | |tuple-ids=0 row-size=12B cardinality=4 | |in pipelines: 00(GETNEXT) | +---+ {noformat} CC [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847742#comment-17847742 ] Quanlong Huang commented on IMPALA-13102: - In the Impala dev env, I can set the stats directly in postgresql: {code:sql} psql -q -U hiveuser -d ${METASTORE_DB} HMS_home_quanlong_workspace_Impala_cdp=> select "TBL_ID" from "TBLS" where "TBL_NAME" = 'alltypes_bak'; TBL_ID 244931 (1 row) HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "TBL_ID" = 244931; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68767 | default | alltypes_bak | double_col |10 68766 | default | alltypes_bak | id | 7300 68765 | default | alltypes_bak | tinyint_col |10 68764 | default | alltypes_bak | timestamp_col | 7300 68763 | default | alltypes_bak | smallint_col|10 68762 | default | alltypes_bak | date_string_col | 736 68761 | default | alltypes_bak | string_col |10 68760 | default | alltypes_bak | float_col |10 68759 | default | alltypes_bak | bigint_col |10 68758 | default | alltypes_bak | year| 2 68757 | default | alltypes_bak | bool_col| 68756 | default | alltypes_bak | int_col |10 (12 rows) HMS_home_quanlong_workspace_Impala_cdp=> UPDATE "TAB_COL_STATS" SET "NUM_DISTINCTS" = -100 where "CS_ID" = 68766; HMS_home_quanlong_workspace_Impala_cdp=> select "CS_ID", "DB_NAME", "TABLE_NAME", "COLUMN_NAME", "NUM_DISTINCTS" from "TAB_COL_STATS" where "CS_ID" = 68766; CS_ID | DB_NAME | TABLE_NAME | COLUMN_NAME | NUM_DISTINCTS ---+-+--+-+--- 68766 | default | alltypes_bak | id | -100 (1 row) {code} > Loading tables with illegal stats failed > > > Key: IMPALA-13102 > URL: https://issues.apache.org/jira/browse/IMPALA-13102 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > When the table has illegal stats, e.g. numDVs=-100, Impala can't load the > table. So DROP STATS or DROP TABLE can't be perform on the table. > {code:sql} > [localhost:21050] default> drop stats alltypes_bak; > Query: drop stats alltypes_bak > ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' > CAUSED BY: TableLoadingException: Failed to load metadata for table: > default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} > We should allow at least dropping the stats or dropping the table. So user > can use Impala to recover the stats. > Stacktrace in the logs: > {noformat} > I0520 08:00:56.661746 17543 jni-util.cc:321] > 5343142d1173494f:44dcde8c] > org.apache.impala.common.AnalysisException: Failed to load metadata for > table: 'alltypes_bak' > at > org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) > at > org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) > at > org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) > at > org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) > Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load > metadata for table: default.alltypes_bak > CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, > avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, > numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} > at > org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) > at org.apache.impala.catalog.Table.fromThrift(Table.java:586) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at >
[jira] [Updated] (IMPALA-13102) Loading tables with illegal stats failed
[ https://issues.apache.org/jira/browse/IMPALA-13102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13102: Description: When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} CC [~VenuReddy] [~hemanth619] [~ngangam] was: When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0,
[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed
Quanlong Huang created IMPALA-13102: --- Summary: Loading tables with illegal stats failed Key: IMPALA-13102 URL: https://issues.apache.org/jira/browse/IMPALA-13102 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13102) Loading tables with illegal stats failed
Quanlong Huang created IMPALA-13102: --- Summary: Loading tables with illegal stats failed Key: IMPALA-13102 URL: https://issues.apache.org/jira/browse/IMPALA-13102 Project: IMPALA Issue Type: Bug Components: Catalog Reporter: Quanlong Huang Assignee: Quanlong Huang When the table has illegal stats, e.g. numDVs=-100, Impala can't load the table. So DROP STATS or DROP TABLE can't be perform on the table. {code:sql} [localhost:21050] default> drop stats alltypes_bak; Query: drop stats alltypes_bak ERROR: AnalysisException: Failed to load metadata for table: 'alltypes_bak' CAUSED BY: TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1}{code} We should allow at least dropping the stats or dropping the table. So user can use Impala to recover the stats. Stacktrace in the logs: {noformat} I0520 08:00:56.661746 17543 jni-util.cc:321] 5343142d1173494f:44dcde8c] org.apache.impala.common.AnalysisException: Failed to load metadata for table: 'alltypes_bak' at org.apache.impala.analysis.Analyzer.resolveTableRef(Analyzer.java:974) at org.apache.impala.analysis.DropStatsStmt.analyze(DropStatsStmt.java:94) at org.apache.impala.analysis.AnalysisContext.analyze(AnalysisContext.java:551) at org.apache.impala.analysis.AnalysisContext.analyzeAndAuthorize(AnalysisContext.java:498) at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:2542) at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:2224) at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1985) at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:175) Caused by: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak CAUSED BY: IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at org.apache.impala.catalog.IncompleteTable.loadFromThrift(IncompleteTable.java:162) at org.apache.impala.catalog.Table.fromThrift(Table.java:586) at org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:479) at org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) at org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) at org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:114) at org.apache.impala.service.Frontend.updateCatalogCache(Frontend.java:585) at org.apache.impala.service.JniFrontend.updateCatalogCache(JniFrontend.java:196) at .: org.apache.impala.catalog.TableLoadingException: Failed to load metadata for table: default.alltypes_bak at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1318) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1213) at org.apache.impala.catalog.TableLoader.load(TableLoader.java:145) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:251) at org.apache.impala.catalog.TableLoadingMgr$2.call(TableLoadingMgr.java:247) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.lang.IllegalStateException: ColumnStats{avgSize_=4.0, avgSerializedSize_=4.0, maxSize_=4, numDistinct_=-100, numNulls_=0, numTrues=-1, numFalses=-1, lowValue=-1, highValue=-1} at com.google.common.base.Preconditions.checkState(Preconditions.java:512) at org.apache.impala.catalog.ColumnStats.validate(ColumnStats.java:1034) at org.apache.impala.catalog.ColumnStats.update(ColumnStats.java:676) at org.apache.impala.catalog.Column.updateStats(Column.java:73) at org.apache.impala.catalog.FeCatalogUtils.injectColumnStats(FeCatalogUtils.java:183) at org.apache.impala.catalog.Table.loadAllColumnStats(Table.java:513) at org.apache.impala.catalog.HdfsTable.load(HdfsTable.java:1269) ... 8 more{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
Quanlong Huang created IMPALA-13094: --- Summary: Query links in /admission page of admissiond doesn't work Key: IMPALA-13094 URL: https://issues.apache.org/jira/browse/IMPALA-13094 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang Attachments: Selection_115.png, Selection_116.png In the /admission page, there are records for queued queries and running queries. The details links for these queries use the hostname of the admissiond. Instead, they should point to the corresponding coordinators. Clicking on the link will jump to the /query_plan endpoint of the admissiond which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. Attached the screenshots for reference. CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
Quanlong Huang created IMPALA-13094: --- Summary: Query links in /admission page of admissiond doesn't work Key: IMPALA-13094 URL: https://issues.apache.org/jira/browse/IMPALA-13094 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang Attachments: Selection_115.png, Selection_116.png In the /admission page, there are records for queued queries and running queries. The details links for these queries use the hostname of the admissiond. Instead, they should point to the corresponding coordinators. Clicking on the link will jump to the /query_plan endpoint of the admissiond which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. Attached the screenshots for reference. CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
[ https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13094: Attachment: Selection_116.png > Query links in /admission page of admissiond doesn't work > - > > Key: IMPALA-13094 > URL: https://issues.apache.org/jira/browse/IMPALA-13094 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: Selection_115.png, Selection_116.png > > > In the /admission page, there are records for queued queries and running > queries. The details links for these queries use the hostname of the > admissiond. Instead, they should point to the corresponding coordinators. > Clicking on the link will jump to the /query_plan endpoint of the admissiond > which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. > Attached the screenshots for reference. > CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13094) Query links in /admission page of admissiond doesn't work
[ https://issues.apache.org/jira/browse/IMPALA-13094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13094: Attachment: Selection_115.png > Query links in /admission page of admissiond doesn't work > - > > Key: IMPALA-13094 > URL: https://issues.apache.org/jira/browse/IMPALA-13094 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > Attachments: Selection_115.png, Selection_116.png > > > In the /admission page, there are records for queued queries and running > queries. The details links for these queries use the hostname of the > admissiond. Instead, they should point to the corresponding coordinators. > Clicking on the link will jump to the /query_plan endpoint of the admissiond > which doesn't exist. Thus failed by Error: No URI handler for '/query_plan'. > Attached the screenshots for reference. > CC [~arawat] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed
Quanlong Huang created IMPALA-13093: --- Summary: Insert into Huawei OBS table failed Key: IMPALA-13093 URL: https://issues.apache.org/jira/browse/IMPALA-13093 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.3.0 Reporter: Quanlong Huang Assignee: Quanlong Huang Insert into a table using Huawei OBS (Object Storage Service) as the storage will failed by the following error: {noformat} Query: insert into test_obs1 values (1, 'abc') ERROR: Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory {noformat} Looking into the logs: {noformat} I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory @ 0xfc6d44 impala::Status::Status() @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() @ 0x1c46569 impala::HdfsTableSink::Send() @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() @ 0x14efca3 impala::FragmentInstanceState::Exec() @ 0x148dc4c impala::QueryState::ExecFInstance() @ 0x1b3bab9 impala::Thread::SuperviseThread() @ 0x1b3cdb1 boost::detail::thread_data<>::run() @ 0x2474a87 thread_proxy @ 0x7fe5a562dea5 start_thread @ 0x7fe5a25ddb0d __clone{noformat} Note that impalad is started with {{--symbolize_stacktrace=true}} so the stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13093) Insert into Huawei OBS table failed
Quanlong Huang created IMPALA-13093: --- Summary: Insert into Huawei OBS table failed Key: IMPALA-13093 URL: https://issues.apache.org/jira/browse/IMPALA-13093 Project: IMPALA Issue Type: Bug Components: Backend Affects Versions: Impala 4.3.0 Reporter: Quanlong Huang Assignee: Quanlong Huang Insert into a table using Huawei OBS (Object Storage Service) as the storage will failed by the following error: {noformat} Query: insert into test_obs1 values (1, 'abc') ERROR: Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory {noformat} Looking into the logs: {noformat} I0516 16:40:55.663640 18922 status.cc:129] fe4ac1be6462a13f:362a9b5b] Failed to get info on temporary HDFS file: obs://obs-test-ee93/input/test_obs1/_impala_insert_staging/fe4ac1be6462a13f_362a9b5b/.fe4ac1be6462a13f-362a9b5b_1213692075_dir//fe4ac1be6462a13f-362a9b5b_375832652_data.0.txt Error(2): No such file or directory @ 0xfc6d44 impala::Status::Status() @ 0x1c42020 impala::HdfsTableSink::CreateNewTmpFile() @ 0x1c44357 impala::HdfsTableSink::InitOutputPartition() @ 0x1c4988a impala::HdfsTableSink::GetOutputPartition() @ 0x1c46569 impala::HdfsTableSink::Send() @ 0x14ee25f impala::FragmentInstanceState::ExecInternal() @ 0x14efca3 impala::FragmentInstanceState::Exec() @ 0x148dc4c impala::QueryState::ExecFInstance() @ 0x1b3bab9 impala::Thread::SuperviseThread() @ 0x1b3cdb1 boost::detail::thread_data<>::run() @ 0x2474a87 thread_proxy @ 0x7fe5a562dea5 start_thread @ 0x7fe5a25ddb0d __clone{noformat} Note that impalad is started with {{--symbolize_stacktrace=true}} so the stacktrace has symbols. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
[ https://issues.apache.org/jira/browse/IMPALA-13086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13086: Attachment: plan.txt > Cardinality estimate of AggregationNode should consider predicates on > group-by columns > -- > > Key: IMPALA-13086 > URL: https://issues.apache.org/jira/browse/IMPALA-13086 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Priority: Critical > Attachments: plan.txt > > > Consider the following tables: > {code:sql} > CREATE EXTERNAL TABLE t1( > t1_id bigint, > t5_id bigint, > t5_name string, > register_date string > ) stored as textfile; > CREATE EXTERNAL TABLE t2( > t1_id bigint, > t3_id bigint, > pay_time timestamp, > refund_time timestamp, > state_code int > ) stored as textfile; > CREATE EXTERNAL TABLE t3( > t3_id bigint, > t3_name string, > class_id int > ) stored as textfile; > CREATE EXTERNAL TABLE t5( > id bigint, > t5_id bigint, > t5_name string, > branch_id bigint, > branch_name string > ) stored as textfile; > alter table t1 set tblproperties('numRows'='6031170829'); > alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); > alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); > alter table t1 set column stats t5_name > ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); > alter table t1 set column stats register_date > ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); > alter table t2 set tblproperties('numRows'='864341085'); > alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); > alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); > alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); > alter table t2 set column stats refund_time > ('numDVs'='251658','numNulls'='791645118'); > alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); > alter table t3 set tblproperties('numRows'='4452'); > alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); > alter table t3 set column stats t3_name > ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); > alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); > alter table t5 set tblproperties('numRows'='2177245'); > alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); > alter table t5 set column stats t5_name > ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); > alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); > alter table t5 set column stats branch_name > ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); > {code} > Put a data file to each table to make the stats valid > {code:bash} > echo '2024' > data.txt > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 > hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 > {code} > REFRESH these tables after adding the data files. > The cardinality of AggregationNodes are overestimated in the following query: > {code:sql} > explain select > register_date, > t4.t5_id, > t5.t5_name, > t5.branch_name, > count(distinct t1_id), > count(distinct case when diff_day=0 then t1_id else null end ), > count(distinct case when diff_day<=3 then t1_id else null end ), > count(distinct case when diff_day<=7 then t1_id else null end ), > count(distinct case when diff_day<=14 then t1_id else null end ), > count(distinct case when diff_day<=30 then t1_id else null end ), > count(distinct case when diff_day<=60 then t1_id else null end ), > count(distinct case when pay_time is not null then t1_id else null end ) > from ( > select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, > datediff(pay_time,register_date) diff_day > from ( > select t1_id,pay_time,t3_id from t2 > where state_code = 0 and pay_time>=trunc(NOW(),'Y') > and cast(pay_time as date) <> cast(refund_time as date) > )t2 > join t3 on t2.t3_id=t3.t3_id > right join t1 on t1.t1_id=t2.t1_id > )t4 > left join t5 on t4.t5_id=t5.t5_id > where register_date='20230515' > group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} > One of the AggregationNode: > {noformat} > 17:AGGREGATE [FINALIZE] > | Class 0 > |output: count:merge(t1_id) > |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name > | Class 1 > |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) > |group
[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
Quanlong Huang created IMPALA-13086: --- Summary: Cardinality estimate of AggregationNode should consider predicates on group-by columns Key: IMPALA-13086 URL: https://issues.apache.org/jira/browse/IMPALA-13086 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Consider the following tables: {code:sql} CREATE EXTERNAL TABLE t1( t1_id bigint, t5_id bigint, t5_name string, register_date string ) stored as textfile; CREATE EXTERNAL TABLE t2( t1_id bigint, t3_id bigint, pay_time timestamp, refund_time timestamp, state_code int ) stored as textfile; CREATE EXTERNAL TABLE t3( t3_id bigint, t3_name string, class_id int ) stored as textfile; CREATE EXTERNAL TABLE t5( id bigint, t5_id bigint, t5_name string, branch_id bigint, branch_name string ) stored as textfile; alter table t1 set tblproperties('numRows'='6031170829'); alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); alter table t1 set column stats t5_name ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); alter table t1 set column stats register_date ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); alter table t2 set tblproperties('numRows'='864341085'); alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); alter table t2 set column stats refund_time ('numDVs'='251658','numNulls'='791645118'); alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); alter table t3 set tblproperties('numRows'='4452'); alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); alter table t3 set column stats t3_name ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); alter table t5 set tblproperties('numRows'='2177245'); alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); alter table t5 set column stats t5_name ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); alter table t5 set column stats branch_name ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); {code} Put a data file to each table to make the stats valid {code:bash} echo '2024' > data.txt hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 {code} REFRESH these tables after adding the data files. The cardinality of AggregationNodes are overestimated in the following query: {code:sql} explain select register_date, t4.t5_id, t5.t5_name, t5.branch_name, count(distinct t1_id), count(distinct case when diff_day=0 then t1_id else null end ), count(distinct case when diff_day<=3 then t1_id else null end ), count(distinct case when diff_day<=7 then t1_id else null end ), count(distinct case when diff_day<=14 then t1_id else null end ), count(distinct case when diff_day<=30 then t1_id else null end ), count(distinct case when diff_day<=60 then t1_id else null end ), count(distinct case when pay_time is not null then t1_id else null end ) from ( select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, datediff(pay_time,register_date) diff_day from ( select t1_id,pay_time,t3_id from t2 where state_code = 0 and pay_time>=trunc(NOW(),'Y') and cast(pay_time as date) <> cast(refund_time as date) )t2 join t3 on t2.t3_id=t3.t3_id right join t1 on t1.t1_id=t2.t1_id )t4 left join t5 on t4.t5_id=t5.t5_id where register_date='20230515' group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} One of the AggregationNode: {noformat} 17:AGGREGATE [FINALIZE] | Class 0 |output: count:merge(t1_id) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 1 |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 2 |output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 3 |output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 4 |output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id,
[jira] [Created] (IMPALA-13086) Cardinality estimate of AggregationNode should consider predicates on group-by columns
Quanlong Huang created IMPALA-13086: --- Summary: Cardinality estimate of AggregationNode should consider predicates on group-by columns Key: IMPALA-13086 URL: https://issues.apache.org/jira/browse/IMPALA-13086 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Consider the following tables: {code:sql} CREATE EXTERNAL TABLE t1( t1_id bigint, t5_id bigint, t5_name string, register_date string ) stored as textfile; CREATE EXTERNAL TABLE t2( t1_id bigint, t3_id bigint, pay_time timestamp, refund_time timestamp, state_code int ) stored as textfile; CREATE EXTERNAL TABLE t3( t3_id bigint, t3_name string, class_id int ) stored as textfile; CREATE EXTERNAL TABLE t5( id bigint, t5_id bigint, t5_name string, branch_id bigint, branch_name string ) stored as textfile; alter table t1 set tblproperties('numRows'='6031170829'); alter table t1 set column stats t1_id ('numDVs'='8131016','numNulls'='0'); alter table t1 set column stats t5_id ('numDVs'='389','numNulls'='0'); alter table t1 set column stats t5_name ('numDVs'='523','numNulls'='85928157','maxsize'='27','avgSize'='17.79120063781738'); alter table t1 set column stats register_date ('numDVs'='9283','numNulls'='0','maxsize'='8','avgSize'='8'); alter table t2 set tblproperties('numRows'='864341085'); alter table t2 set column stats t1_id ('numDVs'='1007302','numNulls'='0'); alter table t2 set column stats t3_id ('numDVs'='5013','numNulls'='2800503'); alter table t2 set column stats pay_time ('numDVs'='1372020','numNulls'='0'); alter table t2 set column stats refund_time ('numDVs'='251658','numNulls'='791645118'); alter table t2 set column stats state_code ('numDVs'='8','numNulls'='0'); alter table t3 set tblproperties('numRows'='4452'); alter table t3 set column stats t3_id ('numDVs'='4452','numNulls'='0'); alter table t3 set column stats t3_name ('numDVs'='4452','numNulls'='0','maxsize'='176','avgSize'='37.60469818115234'); alter table t3 set column stats class_id ('numDVs'='75','numNulls'='0'); alter table t5 set tblproperties('numRows'='2177245'); alter table t5 set column stats t5_id ('numDVs'='826','numNulls'='0'); alter table t5 set column stats t5_name ('numDVs'='523','numNulls'='0','maxsize'='67','avgSize'='19.12560081481934'); alter table t5 set column stats branch_id ('numDVs'='53','numNulls'='0'); alter table t5 set column stats branch_name ('numDVs'='55','numNulls'='0','maxsize'='61','avgSize'='16.05229949951172'); {code} Put a data file to each table to make the stats valid {code:bash} echo '2024' > data.txt hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t1 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t2 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t3 hdfs dfs -put data.txt hdfs://localhost:20500/test-warehouse/lab2.db/t5 {code} REFRESH these tables after adding the data files. The cardinality of AggregationNodes are overestimated in the following query: {code:sql} explain select register_date, t4.t5_id, t5.t5_name, t5.branch_name, count(distinct t1_id), count(distinct case when diff_day=0 then t1_id else null end ), count(distinct case when diff_day<=3 then t1_id else null end ), count(distinct case when diff_day<=7 then t1_id else null end ), count(distinct case when diff_day<=14 then t1_id else null end ), count(distinct case when diff_day<=30 then t1_id else null end ), count(distinct case when diff_day<=60 then t1_id else null end ), count(distinct case when pay_time is not null then t1_id else null end ) from ( select t1.t1_id,t1.register_date,t1.t5_id,t2.pay_time,t2.t3_id,t3.t3_name, datediff(pay_time,register_date) diff_day from ( select t1_id,pay_time,t3_id from t2 where state_code = 0 and pay_time>=trunc(NOW(),'Y') and cast(pay_time as date) <> cast(refund_time as date) )t2 join t3 on t2.t3_id=t3.t3_id right join t1 on t1.t1_id=t2.t1_id )t4 left join t5 on t4.t5_id=t5.t5_id where register_date='20230515' group by register_date,t4.t5_id,t5.t5_name,t5.branch_name;{code} One of the AggregationNode: {noformat} 17:AGGREGATE [FINALIZE] | Class 0 |output: count:merge(t1_id) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 1 |output: count:merge(CASE WHEN diff_day = 0 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 2 |output: count:merge(CASE WHEN diff_day <= 3 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 3 |output: count:merge(CASE WHEN diff_day <= 7 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id, t5.t5_name, t5.branch_name | Class 4 |output: count:merge(CASE WHEN diff_day <= 14 THEN t1_id ELSE NULL END) |group by: register_date, t4.t5_id,
[jira] [Commented] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
[ https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846770#comment-17846770 ] Quanlong Huang commented on IMPALA-13077: - It seems doable: * catalogd always loads the HMS partition objects and 'numRows' is extracted from the parameters: [https://github.com/apache/impala/blob/f87c20800de9f7dc74e47aa9a8c0dc878f4f0840/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java#L1415] * coordinator always loads all partitions when planning such queries. Pulling partition level column stats like NDVs will help more since they are more accurate than the table level column stats. But using the partition level 'numRows' already helps a lot in this case. > Equality predicate on partition column and uncorrelated subquery doesn't > reduce the cardinality estimate > > > Key: IMPALA-13077 > URL: https://issues.apache.org/jira/browse/IMPALA-13077 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. > Consider the following query: > {code:sql} > select xxx from part_tbl > where part_key=(select ... from dim_tbl); > {code} > Its query plan is a JoinNode with two ScanNodes. When estimating the > cardinality of the JoinNode, the planner is not aware that 'part_key' is the > partition column and the cardinality of the JoinNode should not be larger > than the max row count across partitions. > The recent work in IMPALA-12018 (Consider runtime filter for cardinality > reduction) helps in some cases since there are runtime filters on the > partition column. But there are still some cases that we overestimate the > cardinality. For instance, 'ss_sold_date_sk' is the only partition key of > tpcds.store_sales. The following query > {code:sql} > select count(*) from tpcds.store_sales > where ss_sold_date_sk=( > select min(d_date_sk) + 1000 from tpcds.date_dim);{code} > has query plan: > {noformat} > +-+ > | Explain String | > +-+ > | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | > | Per-Host Resource Estimates: Memory=243MB | > | | > | PLAN-ROOT SINK | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:AGGREGATE| > | | output: count(*) | > | | row-size=8B cardinality=1| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| > | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | > | | runtime filters: RF000 <- min(d_date_sk) + 1000 | > | | row-size=4B cardinality=2.88M < Should be max(numRows) across > partitions > | | | > | |--07:EXCHANGE [BROADCAST] | > | | || > | | 06:AGGREGATE [FINALIZE] | > | | | output: min:merge(d_date_sk) | > | | | row-size=4B cardinality=1 | > | | || > | | 05:EXCHANGE [UNPARTITIONED] | > | | || > | | 02:AGGREGATE | > | | | output: min(d_date_sk)| > | | | row-size=4B cardinality=1 | > | | || > | | 01:SCAN HDFS [tpcds.date_dim]| > | | HDFS partitions=1/1 files=1 size=9.84MB | > | | row-size=4B cardinality=73.05K| > | | | > | 00:SCAN HDFS [tpcds.store_sales]| > |HDFS
[jira] [Assigned] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
[ https://issues.apache.org/jira/browse/IMPALA-13077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13077: --- Assignee: Quanlong Huang > Equality predicate on partition column and uncorrelated subquery doesn't > reduce the cardinality estimate > > > Key: IMPALA-13077 > URL: https://issues.apache.org/jira/browse/IMPALA-13077 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. > Consider the following query: > {code:sql} > select xxx from part_tbl > where part_key=(select ... from dim_tbl); > {code} > Its query plan is a JoinNode with two ScanNodes. When estimating the > cardinality of the JoinNode, the planner is not aware that 'part_key' is the > partition column and the cardinality of the JoinNode should not be larger > than the max row count across partitions. > The recent work in IMPALA-12018 (Consider runtime filter for cardinality > reduction) helps in some cases since there are runtime filters on the > partition column. But there are still some cases that we overestimate the > cardinality. For instance, 'ss_sold_date_sk' is the only partition key of > tpcds.store_sales. The following query > {code:sql} > select count(*) from tpcds.store_sales > where ss_sold_date_sk=( > select min(d_date_sk) + 1000 from tpcds.date_dim);{code} > has query plan: > {noformat} > +-+ > | Explain String | > +-+ > | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | > | Per-Host Resource Estimates: Memory=243MB | > | | > | PLAN-ROOT SINK | > | | | > | 09:AGGREGATE [FINALIZE] | > | | output: count:merge(*) | > | | row-size=8B cardinality=1| > | | | > | 08:EXCHANGE [UNPARTITIONED] | > | | | > | 04:AGGREGATE| > | | output: count(*) | > | | row-size=8B cardinality=1| > | | | > | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| > | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | > | | runtime filters: RF000 <- min(d_date_sk) + 1000 | > | | row-size=4B cardinality=2.88M < Should be max(numRows) across > partitions > | | | > | |--07:EXCHANGE [BROADCAST] | > | | || > | | 06:AGGREGATE [FINALIZE] | > | | | output: min:merge(d_date_sk) | > | | | row-size=4B cardinality=1 | > | | || > | | 05:EXCHANGE [UNPARTITIONED] | > | | || > | | 02:AGGREGATE | > | | | output: min(d_date_sk)| > | | | row-size=4B cardinality=1 | > | | || > | | 01:SCAN HDFS [tpcds.date_dim]| > | | HDFS partitions=1/1 files=1 size=9.84MB | > | | row-size=4B cardinality=73.05K| > | | | > | 00:SCAN HDFS [tpcds.store_sales]| > |HDFS partitions=1824/1824 files=1824 size=346.60MB | > |runtime filters: RF000 -> ss_sold_date_sk| > |row-size=4B cardinality=2.88M| > +-+{noformat} > CC [~boroknagyz], [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters
[ https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-9577: --- Fix Version/s: Impala 3.4.2 > Use `system_unsync` time for Kudu test clusters > --- > > Key: IMPALA-9577 > URL: https://issues.apache.org/jira/browse/IMPALA-9577 > Project: IMPALA > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Fix For: Impala 4.0.0, Impala 3.4.2 > > > Recently Kudu made enhancements to time source configuration and adjusted the > time source for local clusters/tests to `system_unsync`. Impala should mirror > that behavior in Impala test clusters given there is no need to require > NTP-synchronized clock for a test where all the participating Kudu masters > and tablet servers are run at the same node using the same local wallclock. > > See the Kudu commit here for details: > [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
Quanlong Huang created IMPALA-13077: --- Summary: Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate Key: IMPALA-13077 URL: https://issues.apache.org/jira/browse/IMPALA-13077 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. Consider the following query: {code:sql} select xxx from part_tbl where part_key=(select ... from dim_tbl); {code} Its query plan is a JoinNode with two ScanNodes. When estimating the cardinality of the JoinNode, the planner is not aware that 'part_key' is the partition column and the cardinality of the JoinNode should not be larger than the max row count across partitions. The recent work in IMPALA-12018 (Consider runtime filter for cardinality reduction) helps in some cases since there are runtime filters on the partition column. But there are still some cases that we overestimate the cardinality. For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. The following query {code:sql} select count(*) from tpcds.store_sales where ss_sold_date_sk=( select min(d_date_sk) + 1000 from tpcds.date_dim);{code} has query plan: {noformat} +-+ | Explain String | +-+ | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | | Per-Host Resource Estimates: Memory=243MB | | | | PLAN-ROOT SINK | | | | | 09:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | row-size=8B cardinality=1| | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 04:AGGREGATE| | | output: count(*) | | | row-size=8B cardinality=1| | | | | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | | | runtime filters: RF000 <- min(d_date_sk) + 1000 | | | row-size=4B cardinality=2.88M < Should be max(numRows) across partitions | | | | |--07:EXCHANGE [BROADCAST] | | | || | | 06:AGGREGATE [FINALIZE] | | | | output: min:merge(d_date_sk) | | | | row-size=4B cardinality=1 | | | || | | 05:EXCHANGE [UNPARTITIONED] | | | || | | 02:AGGREGATE | | | | output: min(d_date_sk)| | | | row-size=4B cardinality=1 | | | || | | 01:SCAN HDFS [tpcds.date_dim]| | | HDFS partitions=1/1 files=1 size=9.84MB | | | row-size=4B cardinality=73.05K| | | | | 00:SCAN HDFS [tpcds.store_sales]| |HDFS partitions=1824/1824 files=1824 size=346.60MB | |runtime filters: RF000 -> ss_sold_date_sk| |row-size=4B cardinality=2.88M| +-+{noformat} CC [~boroknagyz], [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13077) Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate
Quanlong Huang created IMPALA-13077: --- Summary: Equality predicate on partition column and uncorrelated subquery doesn't reduce the cardinality estimate Key: IMPALA-13077 URL: https://issues.apache.org/jira/browse/IMPALA-13077 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: Quanlong Huang Let's say 'part_tbl' is a partitioned table. Its partition key is 'part_key'. Consider the following query: {code:sql} select xxx from part_tbl where part_key=(select ... from dim_tbl); {code} Its query plan is a JoinNode with two ScanNodes. When estimating the cardinality of the JoinNode, the planner is not aware that 'part_key' is the partition column and the cardinality of the JoinNode should not be larger than the max row count across partitions. The recent work in IMPALA-12018 (Consider runtime filter for cardinality reduction) helps in some cases since there are runtime filters on the partition column. But there are still some cases that we overestimate the cardinality. For instance, 'ss_sold_date_sk' is the only partition key of tpcds.store_sales. The following query {code:sql} select count(*) from tpcds.store_sales where ss_sold_date_sk=( select min(d_date_sk) + 1000 from tpcds.date_dim);{code} has query plan: {noformat} +-+ | Explain String | +-+ | Max Per-Host Resource Reservation: Memory=18.94MB Threads=6 | | Per-Host Resource Estimates: Memory=243MB | | | | PLAN-ROOT SINK | | | | | 09:AGGREGATE [FINALIZE] | | | output: count:merge(*) | | | row-size=8B cardinality=1| | | | | 08:EXCHANGE [UNPARTITIONED] | | | | | 04:AGGREGATE| | | output: count(*) | | | row-size=8B cardinality=1| | | | | 03:HASH JOIN [LEFT SEMI JOIN, BROADCAST]| | | hash predicates: ss_sold_date_sk = min(d_date_sk) + 1000 | | | runtime filters: RF000 <- min(d_date_sk) + 1000 | | | row-size=4B cardinality=2.88M < Should be max(numRows) across partitions | | | | |--07:EXCHANGE [BROADCAST] | | | || | | 06:AGGREGATE [FINALIZE] | | | | output: min:merge(d_date_sk) | | | | row-size=4B cardinality=1 | | | || | | 05:EXCHANGE [UNPARTITIONED] | | | || | | 02:AGGREGATE | | | | output: min(d_date_sk)| | | | row-size=4B cardinality=1 | | | || | | 01:SCAN HDFS [tpcds.date_dim]| | | HDFS partitions=1/1 files=1 size=9.84MB | | | row-size=4B cardinality=73.05K| | | | | 00:SCAN HDFS [tpcds.store_sales]| |HDFS partitions=1824/1824 files=1824 size=346.60MB | |runtime filters: RF000 -> ss_sold_date_sk| |row-size=4B cardinality=2.88M| +-+{noformat} CC [~boroknagyz], [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13071) Update the doc of Impala components
Quanlong Huang created IMPALA-13071: --- Summary: Update the doc of Impala components Key: IMPALA-13071 URL: https://issues.apache.org/jira/browse/IMPALA-13071 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang We need to update some discription in the doc of Impala components. [https://impala.apache.org/docs/build/asf-site-html/topics/impala_components.html] In the section ofThe Impala Catalog Service, this is stale: {quote}When you create a table, load data, and so on through Hive, you do need to issue REFRESH or INVALIDATE METADATA on an Impala daemon before executing a query there. {quote} We should mention "Automatic Invalidation/Refresh of Metadata", a.k.a. HMS event processor, and add links for it. Change "Impala daemons" to "Impala Coordinators" in this sentence: {quote}The Impala component known as the Catalog Service relays the metadata changes from Impala SQL statements to all the Impala {color:#de350b}daemons{color} in a cluster. {quote} Also add this link for "On-demand Metadata" [https://impala.apache.org/docs/build/asf-site-html/topics/impala_metadata.html] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13071) Update the doc of Impala components
Quanlong Huang created IMPALA-13071: --- Summary: Update the doc of Impala components Key: IMPALA-13071 URL: https://issues.apache.org/jira/browse/IMPALA-13071 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang We need to update some discription in the doc of Impala components. [https://impala.apache.org/docs/build/asf-site-html/topics/impala_components.html] In the section ofThe Impala Catalog Service, this is stale: {quote}When you create a table, load data, and so on through Hive, you do need to issue REFRESH or INVALIDATE METADATA on an Impala daemon before executing a query there. {quote} We should mention "Automatic Invalidation/Refresh of Metadata", a.k.a. HMS event processor, and add links for it. Change "Impala daemons" to "Impala Coordinators" in this sentence: {quote}The Impala component known as the Catalog Service relays the metadata changes from Impala SQL statements to all the Impala {color:#de350b}daemons{color} in a cluster. {quote} Also add this link for "On-demand Metadata" [https://impala.apache.org/docs/build/asf-site-html/topics/impala_metadata.html] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13070) Introduce concepts in the query plan and execution
Quanlong Huang created IMPALA-13070: --- Summary: Introduce concepts in the query plan and execution Key: IMPALA-13070 URL: https://issues.apache.org/jira/browse/IMPALA-13070 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang We currently have 3 sections for "Impala Concepts": * Components of the Impala Server * Developing Impala Applications * How Impala Fits Into the Hadoop Ecosystem [https://impala.apache.org/docs/build/asf-site-html/topics/impala_concepts.html] It'd be helpful to introduce concepts used in the query plan and query execution, e.g. * Coordinator & Executor * Fragment & Fragment Instance * Operator, Pipeline * Cardinality, Memory Reservation, Memory Estimate * Split/ScanRange * Runtime Filter * Query Profile -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13070) Introduce concepts in the query plan and execution
Quanlong Huang created IMPALA-13070: --- Summary: Introduce concepts in the query plan and execution Key: IMPALA-13070 URL: https://issues.apache.org/jira/browse/IMPALA-13070 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang We currently have 3 sections for "Impala Concepts": * Components of the Impala Server * Developing Impala Applications * How Impala Fits Into the Hadoop Ecosystem [https://impala.apache.org/docs/build/asf-site-html/topics/impala_concepts.html] It'd be helpful to introduce concepts used in the query plan and query execution, e.g. * Coordinator & Executor * Fragment & Fragment Instance * Operator, Pipeline * Cardinality, Memory Reservation, Memory Estimate * Split/ScanRange * Runtime Filter * Query Profile -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Reopened] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory
[ https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reopened IMPALA-11858: - > admissiond incorrectly caps memory limit to its process memory > -- > > Key: IMPALA-11858 > URL: https://issues.apache.org/jira/browse/IMPALA-11858 > Project: IMPALA > Issue Type: Bug >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Critical > > When admission controller is running as a separate daemon it incorrectly caps > memory limit for the query to its process limit. This is also incorrect > behavior when admission controller is running in coordinator as executors > could have different memory limit compared to coordinator. > https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory
[ https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-11858. - Fix Version/s: Impala 4.3.0 Resolution: Fixed > admissiond incorrectly caps memory limit to its process memory > -- > > Key: IMPALA-11858 > URL: https://issues.apache.org/jira/browse/IMPALA-11858 > Project: IMPALA > Issue Type: Bug >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Critical > Fix For: Impala 4.3.0 > > > When admission controller is running as a separate daemon it incorrectly caps > memory limit for the query to its process limit. This is also incorrect > behavior when admission controller is running in coordinator as executors > could have different memory limit compared to coordinator. > https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-11858) admissiond incorrectly caps memory limit to its process memory
[ https://issues.apache.org/jira/browse/IMPALA-11858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-11858. - Fix Version/s: Impala 4.3.0 Resolution: Fixed > admissiond incorrectly caps memory limit to its process memory > -- > > Key: IMPALA-11858 > URL: https://issues.apache.org/jira/browse/IMPALA-11858 > Project: IMPALA > Issue Type: Bug >Reporter: Abhishek Rawat >Assignee: Abhishek Rawat >Priority: Critical > Fix For: Impala 4.3.0 > > > When admission controller is running as a separate daemon it incorrectly caps > memory limit for the query to its process limit. This is also incorrect > behavior when admission controller is running in coordinator as executors > could have different memory limit compared to coordinator. > https://github.com/apache/impala/blob/master/be/src/scheduling/schedule-state.cc#L312#L313 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-11499) Refactor UrlEncode function to handle special characters
[ https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-11499. - Fix Version/s: Impala 4.5.0 Resolution: Fixed Resolving this. Thank [~pranav.lodha] ! > Refactor UrlEncode function to handle special characters > > > Key: IMPALA-11499 > URL: https://issues.apache.org/jira/browse/IMPALA-11499 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Pranav Yogi Lodha >Priority: Critical > Fix For: Impala 4.5.0 > > > Partition values are incorrectly URL-encoded in backend for unicode > characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong. > To reproduce the issue, first create a partition table: > {code:sql} > create table my_part_tbl (id int) partitioned by (p string) stored as parquet; > {code} > Then insert data into it using partition values containing '运'. They will > fail: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='运营业务数据') values (0) > Query submitted at: 2022-08-16 10:03:56 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > Error(5): Input/output error > [localhost:21050] default> insert into my_part_tbl partition(p='运') values > (0); > Query: insert into my_part_tbl partition(p='运') values (0) > Query submitted at: 2022-08-16 10:04:22 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335 > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > Error(5): Input/output error > {noformat} > However, partition value without the character '运' is OK: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='营业务数据') values (0) > Query submitted at: 2022-08-16 10:04:13 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036 > Modified 1 row(s) in 0.21s > {noformat} > Hive is able to execute all these statements. > I'm able to narrow down the issue into Backend, where we URL-encode the > partition value in HdfsTableSink::InitOutputPartition(): > {code:cpp} > string value_str; > partition_key_expr_evals_[j]->PrintValue(value, _str); > // Directory names containing partition-key values need to be > UrlEncoded, in > // particular to avoid problems when '/' is part of the key value > (which might > // occur, for example, with date strings). Hive will URL decode the > value > // transparently when Impala's frontend asks the metastore for > partition key values, > // which makes it particularly important that we use the same encoding > as Hive. It's > // also not necessary to encode the values when writing partition > metadata. You can > // check this with 'show partitions ' in Hive, followed by a > select from a > // decoded partition key value. > string encoded_str; > UrlEncode(value_str, _str, true); > string part_key_value = (encoded_str.empty() ? >
[jira] [Resolved] (IMPALA-11499) Refactor UrlEncode function to handle special characters
[ https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-11499. - Fix Version/s: Impala 4.5.0 Resolution: Fixed Resolving this. Thank [~pranav.lodha] ! > Refactor UrlEncode function to handle special characters > > > Key: IMPALA-11499 > URL: https://issues.apache.org/jira/browse/IMPALA-11499 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Pranav Yogi Lodha >Priority: Critical > Fix For: Impala 4.5.0 > > > Partition values are incorrectly URL-encoded in backend for unicode > characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong. > To reproduce the issue, first create a partition table: > {code:sql} > create table my_part_tbl (id int) partitioned by (p string) stored as parquet; > {code} > Then insert data into it using partition values containing '运'. They will > fail: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='运营业务数据') values (0) > Query submitted at: 2022-08-16 10:03:56 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > Error(5): Input/output error > [localhost:21050] default> insert into my_part_tbl partition(p='运') values > (0); > Query: insert into my_part_tbl partition(p='运') values (0) > Query submitted at: 2022-08-16 10:04:22 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335 > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > Error(5): Input/output error > {noformat} > However, partition value without the character '运' is OK: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='营业务数据') values (0) > Query submitted at: 2022-08-16 10:04:13 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036 > Modified 1 row(s) in 0.21s > {noformat} > Hive is able to execute all these statements. > I'm able to narrow down the issue into Backend, where we URL-encode the > partition value in HdfsTableSink::InitOutputPartition(): > {code:cpp} > string value_str; > partition_key_expr_evals_[j]->PrintValue(value, _str); > // Directory names containing partition-key values need to be > UrlEncoded, in > // particular to avoid problems when '/' is part of the key value > (which might > // occur, for example, with date strings). Hive will URL decode the > value > // transparently when Impala's frontend asks the metastore for > partition key values, > // which makes it particularly important that we use the same encoding > as Hive. It's > // also not necessary to encode the values when writing partition > metadata. You can > // check this with 'show partitions ' in Hive, followed by a > select from a > // decoded partition key value. > string encoded_str; > UrlEncode(value_str, _str, true); > string part_key_value = (encoded_str.empty() ? >
[jira] [Updated] (IMPALA-12688) Support JSON profile imports in webUI
[ https://issues.apache.org/jira/browse/IMPALA-12688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12688: Fix Version/s: Impala 4.4.0 > Support JSON profile imports in webUI > - > > Key: IMPALA-12688 > URL: https://issues.apache.org/jira/browse/IMPALA-12688 > Project: IMPALA > Issue Type: New Feature >Reporter: Surya Hebbar >Assignee: Surya Hebbar >Priority: Major > Fix For: Impala 4.4.0 > > Attachments: clear_all_button.png, descending_order_start_time.png, > imported_profiles_section.png, imported_queries_button.png, > imported_queries_list.png, imported_queries_page.png, > imported_query_statement.png, imported_query_text_plan.png, > imported_query_timeline.png, multiple_query_profile_import.png > > > It would be helpful for users to visualize the query timeline by selecting a > local JSON query profile. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10451) TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive to have HIVE-24157
[ https://issues.apache.org/jira/browse/IMPALA-10451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10451: --- Assignee: Joe McDonnell (was: Quanlong Huang) > TestAvroSchemaResolution.test_avro_schema_resolution fails when bumping Hive > to have HIVE-24157 > --- > > Key: IMPALA-10451 > URL: https://issues.apache.org/jira/browse/IMPALA-10451 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Joe McDonnell >Priority: Major > > TestAvroSchemaResolution.test_avro_schema_resolution recently fails when > building against a Hive version with HIVE-24157. > {code:java} > query_test.test_avro_schema_resolution.TestAvroSchemaResolution.test_avro_schema_resolution[protocol: > beeswax | exec_option: \{'batch_size': 0, 'num_nodes': 0, > 'disable_codegen_rows_threshold': 0, 'disable_codegen': False, > 'abort_on_error': 1, 'exec_single_node_rows_threshold': 0} | table_format: > avro/snap/block] (from pytest) > query_test/test_avro_schema_resolution.py:36: in test_avro_schema_resolution > self.run_test_case('QueryTest/avro-schema-resolution', vector, > unique_database) > common/impala_test_suite.py:690: in run_test_case > self.__verify_results_and_errors(vector, test_section, result, use_db) > common/impala_test_suite.py:523: in __verify_results_and_errors > replace_filenames_with_placeholder) > common/test_result_verifier.py:456: in verify_raw_results > VERIFIER_MAP[verifier](expected, actual) > common/test_result_verifier.py:278: in verify_query_result_is_equal > assert expected_results == actual_results > E assert Comparing QueryTestResults (expected vs actual): > E 10 != 0 > {code} > The failed query is > {code:sql} > select count(*) from functional_avro_snap.avro_coldef {code} > The cause is that data loading for avro_coldef failed. The DML is > {code:sql} > INSERT OVERWRITE TABLE avro_coldef PARTITION(year=2014, month=1) > SELECT bool_col, tinyint_col, smallint_col, int_col, bigint_col, > float_col, double_col, date_string_col, string_col, timestamp_col > FROM (select * from functional.alltypes order by id limit 5) a; > {code} > The failure (found in HS2) is: > {code} > 2021-01-24T01:52:16,340 ERROR [9433ee64-d706-4fa4-a146-18d71bf17013 > HiveServer2-Handler-Pool: Thread-4946] parse.CalcitePlanner: CBO failed, > skipping CBO. > org.apache.hadoop.hive.ql.exec.UDFArgumentException: Casting DATE/TIMESTAMP > types to NUMERIC is prohibited (hive.strict.timestamp.conversion) > at > org.apache.hadoop.hive.ql.udf.TimestampCastRestrictorResolver.getEvalMethod(TimestampCastRestrictorResolver.java:62) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.initialize(GenericUDFBridge.java:168) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:149) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:260) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:292) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getFuncExprNodeDescWithUdfData(TypeCheckProcFactory.java:987) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.ParseUtils.createConversionCast(ParseUtils.java:163) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genConversionSelectOperator(SemanticAnalyzer.java:8551) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genFileSinkPlan(SemanticAnalyzer.java:7908) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPostGroupByBodyPlan(SemanticAnalyzer.java:11100) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:10972) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11901) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:11771) > ~[hive-exec-3.1.3000.7.1.6.0-169.jar:3.1.3000.7.1.6.0-169] > at >
[jira] [Created] (IMPALA-13066) SHOW CREATE TABLE with stats and partitions
Quanlong Huang created IMPALA-13066: --- Summary: SHOW CREATE TABLE with stats and partitions Key: IMPALA-13066 URL: https://issues.apache.org/jira/browse/IMPALA-13066 Project: IMPALA Issue Type: New Feature Components: Backend, Frontend Reporter: Quanlong Huang SHOW CREATE TABLE produces the statement to create the table. In practise, we also want the column stats and partitions. It'd be helpful to add an option for also producing the ADD PARTITION and SET COLUMN STATS statements. E.g. {code:sql} SHOW CREATE TABLE my_tbl WITH STATS;{code} produces {code:sql} CREATE TABLE my_tbl ...; ALTER TABLE my_tbl ADD PARTITION ...; ALTER TABLE my_tbl PARTITION (...) SET TBLPROPERTIES('numRows'='3', 'STATS_GENERATED_VIA_STATS_TASK'='true'); ALTER TABLE my_tbl SET COLUMN STATS c1 ('numDVs'='19','numNulls'='0','maxSize'='8','avgSize'='8'); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13066) SHOW CREATE TABLE with stats and partitions
Quanlong Huang created IMPALA-13066: --- Summary: SHOW CREATE TABLE with stats and partitions Key: IMPALA-13066 URL: https://issues.apache.org/jira/browse/IMPALA-13066 Project: IMPALA Issue Type: New Feature Components: Backend, Frontend Reporter: Quanlong Huang SHOW CREATE TABLE produces the statement to create the table. In practise, we also want the column stats and partitions. It'd be helpful to add an option for also producing the ADD PARTITION and SET COLUMN STATS statements. E.g. {code:sql} SHOW CREATE TABLE my_tbl WITH STATS;{code} produces {code:sql} CREATE TABLE my_tbl ...; ALTER TABLE my_tbl ADD PARTITION ...; ALTER TABLE my_tbl PARTITION (...) SET TBLPROPERTIES('numRows'='3', 'STATS_GENERATED_VIA_STATS_TASK'='true'); ALTER TABLE my_tbl SET COLUMN STATS c1 ('numDVs'='19','numNulls'='0','maxSize'='8','avgSize'='8'); {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13065) Introduce package scripts to launch Impala processes
Quanlong Huang created IMPALA-13065: --- Summary: Introduce package scripts to launch Impala processes Key: IMPALA-13065 URL: https://issues.apache.org/jira/browse/IMPALA-13065 Project: IMPALA Issue Type: Documentation Reporter: Quanlong Huang We should add document for how to use the scripts installed by the RPM/DEB packages at https://impala.apache.org/docs/build/html/topics/impala_processes.html CC [~yx91490] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13064) Install services from RPM/DEB packages
Quanlong Huang created IMPALA-13064: --- Summary: Install services from RPM/DEB packages Key: IMPALA-13064 URL: https://issues.apache.org/jira/browse/IMPALA-13064 Project: IMPALA Issue Type: New Feature Components: Infrastructure Reporter: Quanlong Huang Our doc mentions using the {{service}} command to start impala processes: https://impala.apache.org/docs/build/html/topics/impala_processes.html Start the statestore service using a command similar to the following: {code} $ sudo service impala-state-store start{code} Start the catalog service using a command similar to the following: {code} $ sudo service impala-catalog start{code} Start the Impala daemon services using a command similar to the following: {code} $ sudo service impala-server start{code} The RPM/DEB package should install these services and launch the process using 'impala' user name. CC [~yx91490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13064) Install services from RPM/DEB packages
Quanlong Huang created IMPALA-13064: --- Summary: Install services from RPM/DEB packages Key: IMPALA-13064 URL: https://issues.apache.org/jira/browse/IMPALA-13064 Project: IMPALA Issue Type: New Feature Components: Infrastructure Reporter: Quanlong Huang Our doc mentions using the {{service}} command to start impala processes: https://impala.apache.org/docs/build/html/topics/impala_processes.html Start the statestore service using a command similar to the following: {code} $ sudo service impala-state-store start{code} Start the catalog service using a command similar to the following: {code} $ sudo service impala-catalog start{code} Start the Impala daemon services using a command similar to the following: {code} $ sudo service impala-server start{code} The RPM/DEB package should install these services and launch the process using 'impala' user name. CC [~yx91490] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
[ https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844655#comment-17844655 ] Quanlong Huang commented on IMPALA-13034: - Uploaded a patch to add logs and counters first: [https://gerrit.cloudera.org/c/21412/] With that we can identify the issue and find users that abuse the HTTP requests. Filed IMPALA-13063 for a fix for the issue. We can discuss the solutions there. > Add logs for slow HTTP requests dumping the profile > --- > > Key: IMPALA-13034 > URL: https://issues.apache.org/jira/browse/IMPALA-13034 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > There are several endpoints in WebUI that can dump a query profile: > /query_profile, /query_profile_encoded, /query_profile_plain_text, > /query_profile_json > The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() > which acquires lock of the ClientRequestState. This could blocks client > requests in fetching query results. We should add warning logs when such HTTP > requests run slow (e.g. when the profile is too large to download in a short > time). IP address and other info of such requests should also be logged. > Related codes: > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13063) HTTP requests on in-flight queries blocks query execution in coordinator side
Quanlong Huang created IMPALA-13063: --- Summary: HTTP requests on in-flight queries blocks query execution in coordinator side Key: IMPALA-13063 URL: https://issues.apache.org/jira/browse/IMPALA-13063 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang This is a follow-up task for IMPALA-13034. HTTP requests on in-flight queries usually acquire the lock of ClientRequestState. This could blocks client requests in fetching query results. E.g. there are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json. If the profile is huge, such requests impact the query performance. Fetching the details (profile, exec summary, etc.) of an in-flight query has lower priority and shouldn't block query execution. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13063) HTTP requests on in-flight queries blocks query execution in coordinator side
Quanlong Huang created IMPALA-13063: --- Summary: HTTP requests on in-flight queries blocks query execution in coordinator side Key: IMPALA-13063 URL: https://issues.apache.org/jira/browse/IMPALA-13063 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang This is a follow-up task for IMPALA-13034. HTTP requests on in-flight queries usually acquire the lock of ClientRequestState. This could blocks client requests in fetching query results. E.g. there are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json. If the profile is huge, such requests impact the query performance. Fetching the details (profile, exec summary, etc.) of an in-flight query has lower priority and shouldn't block query execution. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-9577) Use `system_unsync` time for Kudu test clusters
[ https://issues.apache.org/jira/browse/IMPALA-9577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-9577: --- Fix Version/s: Impala 4.0.0 (was: Impala 3.4.0) > Use `system_unsync` time for Kudu test clusters > --- > > Key: IMPALA-9577 > URL: https://issues.apache.org/jira/browse/IMPALA-9577 > Project: IMPALA > Issue Type: Improvement >Reporter: Grant Henke >Assignee: Grant Henke >Priority: Major > Fix For: Impala 4.0.0 > > > Recently Kudu made enhancements to time source configuration and adjusted the > time source for local clusters/tests to `system_unsync`. Impala should mirror > that behavior in Impala test clusters given there is no need to require > NTP-synchronized clock for a test where all the participating Kudu masters > and tablet servers are run at the same node using the same local wallclock. > > See the Kudu commit here for details: > [https://github.com/apache/kudu/commit/eb2b70d4b96be2fc2fdd6b3625acc284ac5774be] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13035) Querying metadata tables from non-Iceberg tables throws IllegalArgumentException
[ https://issues.apache.org/jira/browse/IMPALA-13035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13035: Fix Version/s: Impala 4.5.0 > Querying metadata tables from non-Iceberg tables throws > IllegalArgumentException > > > Key: IMPALA-13035 > URL: https://issues.apache.org/jira/browse/IMPALA-13035 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Daniel Becker >Priority: Minor > Labels: impala-iceberg > Fix For: Impala 4.5.0 > > > If a query targets an Iceberg metadata table like default.xy.`files` and the > xy table is not an Iceberg table then the analyzer throws > IllegalArgumentException. > The main concern is that IcebergMetadataTable.java:isIcebergMetadataTable is > called before it's validated that the table is indeed an IcebergTable. > Example: > {code:java} > create table xy(a int); > select * from default.xy.`files`;{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-13009) Potential leak of partition deletions in the catalog topic
[ https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13009. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Potential leak of partition deletions in the catalog topic > -- > > Key: IMPALA-13009 > URL: https://issues.apache.org/jira/browse/IMPALA-13009 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, > Impala 4.1.2, Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > Catalogd might not send partition deletions to the catalog topic in the > following scenario: > * Some partitions of a table are dropped. > * The HdfsTable object is removed sequentially before catalogd collects the > dropped partitions. > In such case, catalogd loses track of the dropped partitions so their updates > keep existing in the catalog topic, until the partition names are reused > again. > Note that the HdfsTable object can be removed by commands like DropTable or > INVALIDATE. > The leaked partitions will be detected when a coordinator restarts. An > IllegalStateException complaining stale partitions will be reported, causing > the table not being added to the catalog cache of coordinator. > {noformat} > E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog > object: Received stale partition in a statestore update: > THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, > type:TColumnType(types:[TTypeNode(type:SCALAR, > scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, > int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], > location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, > file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 > 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 > 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 > 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 > 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 > 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, > stats:TTableStats(num_rows:-1), is_marked_cached:false, > hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, > numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, > has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, > partition_name:p=106, > hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, > collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, > blockSize:0)) > Java exception follows: > java.lang.IllegalStateException: Received stale partition in a statestore > update: > THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, > type:TColumnType(types:[TTypeNode(type:SCALAR, > scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, > int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], > location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, > file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 > 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 > 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 > 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 > 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 > 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, > stats:TTableStats(num_rows:-1), is_marked_cached:false, > hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, > numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, > has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, > partition_name:p=106, > hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, > collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, > blockSize:0)) > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120) > at >
[jira] [Resolved] (IMPALA-13009) Potential leak of partition deletions in the catalog topic
[ https://issues.apache.org/jira/browse/IMPALA-13009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-13009. - Fix Version/s: Impala 4.5.0 Resolution: Fixed > Potential leak of partition deletions in the catalog topic > -- > > Key: IMPALA-13009 > URL: https://issues.apache.org/jira/browse/IMPALA-13009 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Affects Versions: Impala 4.0.0, Impala 4.1.0, Impala 4.2.0, Impala 4.1.1, > Impala 4.1.2, Impala 4.3.0 >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.5.0 > > > Catalogd might not send partition deletions to the catalog topic in the > following scenario: > * Some partitions of a table are dropped. > * The HdfsTable object is removed sequentially before catalogd collects the > dropped partitions. > In such case, catalogd loses track of the dropped partitions so their updates > keep existing in the catalog topic, until the partition names are reused > again. > Note that the HdfsTable object can be removed by commands like DropTable or > INVALIDATE. > The leaked partitions will be detected when a coordinator restarts. An > IllegalStateException complaining stale partitions will be reported, causing > the table not being added to the catalog cache of coordinator. > {noformat} > E0417 16:41:22.317298 20746 ImpaladCatalog.java:264] Error adding catalog > object: Received stale partition in a statestore update: > THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, > type:TColumnType(types:[TTypeNode(type:SCALAR, > scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, > int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], > location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, > file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 > 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 > 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 > 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 > 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 > 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, > stats:TTableStats(num_rows:-1), is_marked_cached:false, > hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, > numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, > has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, > partition_name:p=106, > hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, > collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, > blockSize:0)) > Java exception follows: > java.lang.IllegalStateException: Received stale partition in a statestore > update: > THdfsPartition(partitionKeyExprs:[TExpr(nodes:[TExprNode(node_type:INT_LITERAL, > type:TColumnType(types:[TTypeNode(type:SCALAR, > scalar_type:TScalarType(type:INT))]), num_children:0, is_constant:true, > int_literal:TIntLiteral(value:106), is_codegen_disabled:false)])], > location:THdfsPartitionLocation(prefix_index:0, suffix:p=106), id:138, > file_desc:[THdfsFileDesc(file_desc_data:18 00 00 00 00 00 00 00 00 00 0E 00 > 1C 00 18 00 10 00 00 00 08 00 04 00 0E 00 00 00 18 00 00 00 8B 0E 2D EB 8E 01 > 00 00 04 00 00 00 00 00 00 00 0C 00 00 00 01 00 00 00 4C 00 00 00 36 00 00 00 > 34 34 34 37 62 35 66 34 62 30 65 64 66 64 65 31 2D 32 33 33 61 64 62 38 35 30 > 30 30 30 30 30 30 30 5F 36 36 34 31 30 39 33 37 33 5F 64 61 74 61 2E 30 2E 74 > 78 74 00 00 0C 00 14 00 00 00 0C 00...)], access_level:READ_WRITE, > stats:TTableStats(num_rows:-1), is_marked_cached:false, > hms_parameters:{transient_lastDdlTime=1713342582, totalSize=4, > numFilesErasureCoded=0, numFiles=1}, num_blocks:1, total_file_size_bytes:4, > has_incremental_stats:false, write_id:0, db_name:default, tbl_name:my_part, > partition_name:p=106, > hdfs_storage_descriptor:THdfsStorageDescriptor(lineDelim:10, fieldDelim:1, > collectionDelim:1, mapKeyDelim:1, escapeChar:0, quoteChar:1, fileFormat:TEXT, > blockSize:0)) > at > com.google.common.base.Preconditions.checkState(Preconditions.java:512) > at > org.apache.impala.catalog.ImpaladCatalog.addTable(ImpaladCatalog.java:523) > at > org.apache.impala.catalog.ImpaladCatalog.addCatalogObject(ImpaladCatalog.java:334) > at > org.apache.impala.catalog.ImpaladCatalog.updateCatalog(ImpaladCatalog.java:262) > at > org.apache.impala.service.FeCatalogManager$CatalogdImpl.updateCatalogCache(FeCatalogManager.java:120) > at >
[jira] [Assigned] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
[ https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13034: --- Assignee: Quanlong Huang > Add logs for slow HTTP requests dumping the profile > --- > > Key: IMPALA-13034 > URL: https://issues.apache.org/jira/browse/IMPALA-13034 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > > There are several endpoints in WebUI that can dump a query profile: > /query_profile, /query_profile_encoded, /query_profile_plain_text, > /query_profile_json > The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() > which acquires lock of the ClientRequestState. This could blocks client > requests in fetching query results. We should add warning logs when such HTTP > requests run slow (e.g. when the profile is too large to download in a short > time). IP address and other info of such requests should also be logged. > Related codes: > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12795) TestWebPage.test_catalog_operation_fields is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12795. - Fix Version/s: Impala 4.4.0 Resolution: Fixed > TestWebPage.test_catalog_operation_fields is flaky > -- > > Key: IMPALA-12795 > URL: https://issues.apache.org/jira/browse/IMPALA-12795 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.4.0 > > > Saw the test failed in an internal job: > {noformat} > webserver/test_web_pages.py:942: in test_catalog_operation_fields > assert matched > E assert False{noformat} > That means the CREATE DATABASE statement was not found in the coordinator > webUI. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (IMPALA-12795) TestWebPage.test_catalog_operation_fields is flaky
[ https://issues.apache.org/jira/browse/IMPALA-12795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12795. - Fix Version/s: Impala 4.4.0 Resolution: Fixed > TestWebPage.test_catalog_operation_fields is flaky > -- > > Key: IMPALA-12795 > URL: https://issues.apache.org/jira/browse/IMPALA-12795 > Project: IMPALA > Issue Type: Bug >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Critical > Fix For: Impala 4.4.0 > > > Saw the test failed in an internal job: > {noformat} > webserver/test_web_pages.py:942: in test_catalog_operation_fields > assert matched > E assert False{noformat} > That means the CREATE DATABASE statement was not found in the coordinator > webUI. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI
[ https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843633#comment-17843633 ] Quanlong Huang commented on IMPALA-13033: - We can make it more robust by handling the error of "liness.fail()", e.g. assigning "line" to "encoded_profile" directly. > impala-profile-tool should support parsing thrift profiles downloaded from > WebUI > > > Key: IMPALA-13033 > URL: https://issues.apache.org/jira/browse/IMPALA-13033 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: Anshula Jain >Priority: Major > Labels: newbie, ramp-up > > In the coordinator WebUI, users can download query profiles in > text/json/thrift formats. The thrift profile is the same as one line in the > profile log without the timestamp and query id at the beginning. > impala-profile-tool fails to parse such a file. It should retry parsing the > whole line as the encoded profile. Current code snipper: > {code:cpp} > // Parse out fields from the line. > istringstream liness(line); > int64_t timestamp; > string query_id, encoded_profile; > liness >> timestamp >> query_id >> encoded_profile; > if (liness.fail()) { > cerr << "Error parsing line " << lineno << ": '" << line << "'\n"; > ++errors; > continue; > }{code} > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13044) Upgrade bouncycastle to 1.78
[ https://issues.apache.org/jira/browse/IMPALA-13044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13044: Fix Version/s: Impala 4.5.0 Target Version: (was: Impala 4.4.0) > Upgrade bouncycastle to 1.78 > > > Key: IMPALA-13044 > URL: https://issues.apache.org/jira/browse/IMPALA-13044 > Project: IMPALA > Issue Type: Task > Components: Frontend >Affects Versions: Impala 4.3.0 >Reporter: Peter Rozsa >Assignee: Peter Rozsa >Priority: Major > Fix For: Impala 4.5.0 > > > Impala uses boucycastle:1.68 which contains various CVEs. Upgrading to 1.78 > resolves these security concerns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py
[ https://issues.apache.org/jira/browse/IMPALA-13047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842316#comment-17842316 ] Quanlong Huang commented on IMPALA-13047: - Uploaded a patch for review: https://gerrit.cloudera.org/c/21376/ > Support restarting a specified impalad in bin/start-impala-cluster.py > - > > Key: IMPALA-13047 > URL: https://issues.apache.org/jira/browse/IMPALA-13047 > Project: IMPALA > Issue Type: New Feature > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > > Currently, bin/start-impala-cluster.py can restart catalogd, statestored and > *all* impalads. It'd be useful to support only restarting one impalad. We > need this in the debug of IMPALA-13009. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled
[ https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12835. - Fix Version/s: Impala 4.4.0 Resolution: Fixed Resolving this. Thank [~csringhofer] ! > Transactional tables are unsynced when > hms_event_incremental_refresh_transactional_table is disabled > > > Key: IMPALA-12835 > URL: https://issues.apache.org/jira/browse/IMPALA-12835 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Csaba Ringhofer >Priority: Critical > Fix For: Impala 4.4.0 > > > There are some test failures when > hms_event_incremental_refresh_transactional_table is disabled: > * > tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events > * > tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication > I can reproduce the issue locally: > {noformat} > $ bin/start-impala-cluster.py > --catalogd_args=--hms_event_incremental_refresh_transactional_table=false > impala-shell> create table txn_tbl (id int, val int) stored as parquet > tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > impala-shell> describe txn_tbl; -- make the table loaded in Impala > hive> insert into txn_tbl values(101, 200); > impala-shell> select * from txn_tbl; {noformat} > Impala shows no results until a REFRESH runs on this table. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-12835) Transactional tables are unsynced when hms_event_incremental_refresh_transactional_table is disabled
[ https://issues.apache.org/jira/browse/IMPALA-12835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang resolved IMPALA-12835. - Fix Version/s: Impala 4.4.0 Resolution: Fixed Resolving this. Thank [~csringhofer] ! > Transactional tables are unsynced when > hms_event_incremental_refresh_transactional_table is disabled > > > Key: IMPALA-12835 > URL: https://issues.apache.org/jira/browse/IMPALA-12835 > Project: IMPALA > Issue Type: Bug > Components: Catalog >Reporter: Quanlong Huang >Assignee: Csaba Ringhofer >Priority: Critical > Fix For: Impala 4.4.0 > > > There are some test failures when > hms_event_incremental_refresh_transactional_table is disabled: > * > tests/metadata/test_event_processing.py::TestEventProcessing::test_transactional_insert_events > * > tests/metadata/test_event_processing.py::TestEventProcessing::test_event_based_replication > I can reproduce the issue locally: > {noformat} > $ bin/start-impala-cluster.py > --catalogd_args=--hms_event_incremental_refresh_transactional_table=false > impala-shell> create table txn_tbl (id int, val int) stored as parquet > tblproperties > ('transactional'='true','transactional_properties'='insert_only'); > impala-shell> describe txn_tbl; -- make the table loaded in Impala > hive> insert into txn_tbl values(101, 200); > impala-shell> select * from txn_tbl; {noformat} > Impala shows no results until a REFRESH runs on this table. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py
Quanlong Huang created IMPALA-13047: --- Summary: Support restarting a specified impalad in bin/start-impala-cluster.py Key: IMPALA-13047 URL: https://issues.apache.org/jira/browse/IMPALA-13047 Project: IMPALA Issue Type: New Feature Components: Infrastructure Reporter: Quanlong Huang Assignee: Quanlong Huang Currently, bin/start-impala-cluster.py can restart catalogd, statestored and *all* impalads. It'd be useful to support only restarting one impalad. We need this in the debug of IMPALA-13009. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13047) Support restarting a specified impalad in bin/start-impala-cluster.py
Quanlong Huang created IMPALA-13047: --- Summary: Support restarting a specified impalad in bin/start-impala-cluster.py Key: IMPALA-13047 URL: https://issues.apache.org/jira/browse/IMPALA-13047 Project: IMPALA Issue Type: New Feature Components: Infrastructure Reporter: Quanlong Huang Assignee: Quanlong Huang Currently, bin/start-impala-cluster.py can restart catalogd, statestored and *all* impalads. It'd be useful to support only restarting one impalad. We need this in the debug of IMPALA-13009. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (IMPALA-12917) Several tests in TestEventProcessingError fail
[ https://issues.apache.org/jira/browse/IMPALA-12917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-12917: Fix Version/s: Impala 4.4.0 > Several tests in TestEventProcessingError fail > -- > > Key: IMPALA-12917 > URL: https://issues.apache.org/jira/browse/IMPALA-12917 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Daniel Becker >Assignee: Venugopal Reddy K >Priority: Blocker > Labels: broken-build > Fix For: Impala 4.4.0 > > > The failing tests are > TestEventProcessingError.test_event_processor_error_alter_partition > TestEventProcessingError.test_event_processor_error_alter_partitions > TestEventProcessingError.test_event_processor_error_commit_compaction_event > TestEventProcessingError.test_event_processor_error_commit_txn > TestEventProcessingError.test_event_processor_error_stress_test > Stacktrace: > {code:java} > E Error: Error while compiling statement: FAILED: Execution Error, return > code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask. > java.lang.NullPointerException > E at org.apache.tez.client.TezClient.cleanStagingDir(TezClient.java:424) > E at org.apache.tez.client.TezClient.start(TezClient.java:413) > E at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.startSessionAndContainers(TezSessionState.java:556) > E at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.openInternal(TezSessionState.java:387) > E at > org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:302) > E at > org.apache.hadoop.hive.ql.exec.tez.TezSessionPoolSession.open(TezSessionPoolSession.java:106) > E at > org.apache.hadoop.hive.ql.exec.tez.TezTask.ensureSessionHasResources(TezTask.java:468) > E at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:227) > E at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:213) > E at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:105) > E at org.apache.hadoop.hive.ql.Executor.launchTask(Executor.java:356) > E at org.apache.hadoop.hive.ql.Executor.launchTasks(Executor.java:329) > E at org.apache.hadoop.hive.ql.Executor.runTasks(Executor.java:246) > E at org.apache.hadoop.hive.ql.Executor.execute(Executor.java:107) > E at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:809) > E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:546) > E at org.apache.hadoop.hive.ql.Driver.run(Driver.java:540) > E at > org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:190) > E at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:235) > E at > org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:92) > E at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:340) > E at java.security.AccessController.doPrivileged(Native Method) > E at javax.security.auth.Subject.doAs(Subject.java:422) > E at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899) > E at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:360) > E at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > E at java.util.concurrent.FutureTask.run(FutureTask.java:266) > E at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > E at java.util.concurrent.FutureTask.run(FutureTask.java:266) > E at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > E at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > E at java.lang.Thread.run(Thread.java:748) (state=08S01,code=1) > {code} > These tests were introduced by IMPALA-12832, [~VenuReddy] could you take a > look? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13041) Support Reading and Writing Puffin File Stats for Iceberg Tables
[ https://issues.apache.org/jira/browse/IMPALA-13041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13041: Labels: catalog-2024 impala-iceberg (was: impala-iceberg) > Support Reading and Writing Puffin File Stats for Iceberg Tables > > > Key: IMPALA-13041 > URL: https://issues.apache.org/jira/browse/IMPALA-13041 > Project: IMPALA > Issue Type: Improvement > Components: Backend, Frontend >Reporter: Manish Maheshwari >Priority: Major > Labels: catalog-2024, impala-iceberg > > Puffin File format is an iceberg upstream spec to support stats for iceberg > tables. These stats cannot be both read and used for query planning and > written by Impala today. We want to extend support in Impala to do the below > - > # Read stats from Puffin files > # Write stats to puffin files during load/insert/update/delete commands (as > applicable) > # Modify compute stats command for iceberg tables to compute stats and store > them in Puffin files > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10848) Provide compile-only option to skip downloading test dependencies
[ https://issues.apache.org/jira/browse/IMPALA-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-10848: --- Assignee: Quanlong Huang (was: XiangYang) OK, assigning this to myself. > Provide compile-only option to skip downloading test dependencies > - > > Key: IMPALA-10848 > URL: https://issues.apache.org/jira/browse/IMPALA-10848 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: Quanlong Huang >Priority: Major > Attachments: pywebhdfs_failure.png > > > Compiling Impala is not easy for a beginner. A portion of failures are in > downloading/installing dependencies. > For instance, old versions of Impala may fail to compile since cdh components > of old GBNs on S3 are removed. However, the artifacts of cdh component are > only used in testing (minicluster & holding testdata). We can still compile > without them. > Take pip dependencies as another example, here is a failure I got from a > community user. It failed by installing pywebhdfs: > !pywebhdfs_failure.png! > However, simple git-grep shows that pywebhdfs is only used in tests: > {code:bash} > $ git grep pywebhdfs > bin/bootstrap_system.sh:# >>> from pywebhdfs.webhdfs import PyWebHdfsClient > infra/python/deps/requirements.txt:pywebhdfs == 0.3.2 > tests/common/impala_test_suite.py: # HDFS: uses a mixture of pywebhdfs > (which is faster than the HDFS CLI) and the > tests/util/hdfs_util.py:from pywebhdfs.webhdfs import PyWebHdfsClient, > errors, _raise_pywebhdfs_exception > tests/util/hdfs_util.py: > _raise_pywebhdfs_exception(response.status_code, response.text) > tests/util/hdfs_util.py: > _raise_pywebhdfs_exception(response.status_code, response.text) > tests/util/hdfs_util.py: > _raise_pywebhdfs_exception(response.status_code, response.text) > tests/util/hdfs_util.py: > _raise_pywebhdfs_exception(response.status_code, response.text) {code} > If the user just wants to compile Impala and deploys it in their existing > Hadoop cluster, dealing with these failures is a waste of their time. > *Target for this JIRA* > * Provide compile-only option to bin/bootstrap_system.sh. It should skip > downloading/installing unused dependencies like postgresql. > * Provide compile-only option to buildall.sh. It should skip downloading > unused cdh/cdp components in compilation. > * Update our > [wiki|https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala] > about this. > Note that we already have some env vars to control the download behaviors, > e.g. SKIP_PYTHON_DOWNLOAD, SKIP_TOOLCHAIN_BOOTSTRAP. We just need to make the > compile-only scenario works with minimal requirements and document it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-11499) Refactor UrlEncode function to handle special characters
[ https://issues.apache.org/jira/browse/IMPALA-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841255#comment-17841255 ] Quanlong Huang commented on IMPALA-11499: - [~daniel.becker] found the root cause in the review: [https://gerrit.cloudera.org/c/21131/6/be/src/util/coding-util.cc#55] The problem is in this string: {code:cpp} static function HiveShouldEscape = is_any_of("\"#%\\*/:=?\u00FF");{code} "\u00FF" is the unicode of ÿ which is encoded into two bytes in UTF-8: 0xc3 {*}0xbf{*}. "运" is encoded into 3 bytes in UTF-8: 0xe8 *0xbf* 0x90. The second byte *0xbf* matches in the set so it's encoded as "%FFBF". The other bytes remain unchanged. That's the problem. We can find more common Chinese characters that could hit this, e.g. * 近: 0xe8 0xbf 0x91 * 返: 0xe8 0xbf 0x94 * 还: 0xe8 0xbf 0x98 * 这: 0xe8 0xbf 0x99 * 进: 0xe8 0xbf 0x9b * 远: 0xe8 0xbf 0x9c > Refactor UrlEncode function to handle special characters > > > Key: IMPALA-11499 > URL: https://issues.apache.org/jira/browse/IMPALA-11499 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Assignee: Pranav Yogi Lodha >Priority: Critical > > Partition values are incorrectly URL-encoded in backend for unicode > characters, e.g. '运营业务数据' is encoded to '�%FFBF�营业务数据' which is wrong. > To reproduce the issue, first create a partition table: > {code:sql} > create table my_part_tbl (id int) partitioned by (p string) stored as parquet; > {code} > Then insert data into it using partition values containing '运'. They will > fail: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='运营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='运营业务数据') values (0) > Query submitted at: 2022-08-16 10:03:56 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=404ac3027c4b7169:39d16a2d > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/404ac3027c4b7169_39d16a2d/.404ac3027c4b7169-39d16a2d_1475855322_dir/p=�%FFBF�营业务数据/404ac3027c4b7169-39d16a2d_1585092794_data.0.parq > Error(5): Input/output error > [localhost:21050] default> insert into my_part_tbl partition(p='运') values > (0); > Query: insert into my_part_tbl partition(p='运') values (0) > Query submitted at: 2022-08-16 10:04:22 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=a64e5883473ec28d:86e7e335 > ERROR: Error(s) moving partition files. First error (of 1) was: Hdfs op > (RENAME > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > TO > hdfs://localhost:20500/test-warehouse/my_part_tbl/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq) > failed, error was: > hdfs://localhost:20500/test-warehouse/my_part_tbl/_impala_insert_staging/a64e5883473ec28d_86e7e335/.a64e5883473ec28d-86e7e335_1582623091_dir/p=�%FFBF�/a64e5883473ec28d-86e7e335_163454510_data.0.parq > Error(5): Input/output error > {noformat} > However, partition value without the character '运' is OK: > {noformat} > [localhost:21050] default> insert into my_part_tbl partition(p='营业务数据') > values (0); > Query: insert into my_part_tbl partition(p='营业务数据') values (0) > Query submitted at: 2022-08-16 10:04:13 (Coordinator: > http://quanlong-OptiPlex-BJ:25000) > Query progress can be monitored at: > http://quanlong-OptiPlex-BJ:25000/query_plan?query_id=b04894bfcfc3836a:b1ac9036 > Modified 1 row(s) in 0.21s > {noformat} > Hive is able to execute all these statements. > I'm able to narrow down the issue into Backend, where we URL-encode the > partition value in HdfsTableSink::InitOutputPartition(): > {code:cpp} > string value_str; > partition_key_expr_evals_[j]->PrintValue(value, _str); > // Directory names containing partition-key values need to be > UrlEncoded, in > // particular to avoid problems when '/' is part of the key value > (which might > // occur,
[jira] [Commented] (IMPALA-3192) Toolchain build should be able to use prebuilt artifacts
[ https://issues.apache.org/jira/browse/IMPALA-3192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841087#comment-17841087 ] Quanlong Huang commented on IMPALA-3192: This will be helpful. When building the ORC lib only, I have to manually add a script for it like this {code:bash} # Exit on non-true return value set -e # Exit on reference to uninitialized variable set -u set -o pipefail source ./init.sh source ./init-compiler.sh export LZ4_VERSION=1.9.3 export PROTOBUF_VERSION=3.14.0 export SNAPPY_VERSION=1.1.8 export ZLIB_VERSION=1.2.13 export ZSTD_VERSION=1.5.2 export GOOGLETEST_VERSION=1.8.0 $SOURCE_DIR/source/protobuf/build.sh $SOURCE_DIR/source/zlib/build.sh $SOURCE_DIR/source/googletest/build.sh $SOURCE_DIR/source/snappy/build.sh $SOURCE_DIR/source/lz4/build.sh $SOURCE_DIR/source/zstd/build.sh ORC_VERSION=1.7.9-p10 $SOURCE_DIR/source/orc/build.sh {code} But it still builds the compiler and dependencies in the first run. > Toolchain build should be able to use prebuilt artifacts > > > Key: IMPALA-3192 > URL: https://issues.apache.org/jira/browse/IMPALA-3192 > Project: IMPALA > Issue Type: Task > Components: Infrastructure >Affects Versions: Impala 2.5.0 >Reporter: casey >Priority: Minor > > The toolchain build should have an option (maybe the default) to only build > what isn't already available for download. Currently, if you want to build > the toolchain locally it builds everything. I think the most common use case > for a local build is when you want to add something. In that case, you don't > want to redo the work of building existing components, they can just be > downloaded. > This would also help avoid issues like > https://issues.cloudera.org/browse/IMPALA-3191 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-12266) Sporadic failure after migrating a table to Iceberg
[ https://issues.apache.org/jira/browse/IMPALA-12266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-12266: --- Assignee: (was: Quanlong Huang) > Sporadic failure after migrating a table to Iceberg > --- > > Key: IMPALA-12266 > URL: https://issues.apache.org/jira/browse/IMPALA-12266 > Project: IMPALA > Issue Type: Bug > Components: fe >Affects Versions: Impala 4.2.0 >Reporter: Tamas Mate >Priority: Major > Labels: impala-iceberg > Attachments: > catalogd.bd40020df22b.invalid-user.log.INFO.20230704-181939.1, > impalad.6c0f48d9ce66.invalid-user.log.INFO.20230704-181940.1 > > > TestIcebergTable.test_convert_table test failed in a recent verify job's > dockerised tests: > https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/7629 > {code:none} > E ImpalaBeeswaxException: ImpalaBeeswaxException: > EINNER EXCEPTION: > EMESSAGE: AnalysisException: Failed to load metadata for table: > 'parquet_nopartitioned' > E CAUSED BY: TableLoadingException: Could not load table > test_convert_table_cdba7383.parquet_nopartitioned from catalog > E CAUSED BY: TException: > TGetPartialCatalogObjectResponse(status:TStatus(status_code:GENERAL, > error_msgs:[NullPointerException: null]), lookup_status:OK) > {code} > {code:none} > E0704 19:09:22.980131 833 JniUtil.java:183] > 7145c21173f2c47b:2579db55] Error in Getting partial catalog object of > TABLE:test_convert_table_cdba7383.parquet_nopartitioned. Time spent: 49ms > I0704 19:09:22.980309 833 jni-util.cc:288] > 7145c21173f2c47b:2579db55] java.lang.NullPointerException > at > org.apache.impala.catalog.CatalogServiceCatalog.replaceTableIfUnchanged(CatalogServiceCatalog.java:2357) > at > org.apache.impala.catalog.CatalogServiceCatalog.getOrLoadTable(CatalogServiceCatalog.java:2300) > at > org.apache.impala.catalog.CatalogServiceCatalog.doGetPartialCatalogObject(CatalogServiceCatalog.java:3587) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3513) > at > org.apache.impala.catalog.CatalogServiceCatalog.getPartialCatalogObject(CatalogServiceCatalog.java:3480) > at > org.apache.impala.service.JniCatalog.lambda$getPartialCatalogObject$11(JniCatalog.java:397) > at > org.apache.impala.service.JniCatalogOp.lambda$execAndSerialize$1(JniCatalogOp.java:90) > at org.apache.impala.service.JniCatalogOp.execOp(JniCatalogOp.java:58) > at > org.apache.impala.service.JniCatalogOp.execAndSerialize(JniCatalogOp.java:89) > at > org.apache.impala.service.JniCatalogOp.execAndSerializeSilentStartAndFinish(JniCatalogOp.java:109) > at > org.apache.impala.service.JniCatalog.execAndSerializeSilentStartAndFinish(JniCatalog.java:238) > at > org.apache.impala.service.JniCatalog.getPartialCatalogObject(JniCatalog.java:396) > I0704 19:09:22.980324 833 status.cc:129] 7145c21173f2c47b:2579db55] > NullPointerException: null > @ 0x1012f9f impala::Status::Status() > @ 0x187f964 impala::JniUtil::GetJniExceptionMsg() > @ 0xfee920 impala::JniCall::Call<>() > @ 0xfccd0f impala::Catalog::GetPartialCatalogObject() > @ 0xfb55a5 > impala::CatalogServiceThriftIf::GetPartialCatalogObject() > @ 0xf7a691 > impala::CatalogServiceProcessorT<>::process_GetPartialCatalogObject() > @ 0xf82151 impala::CatalogServiceProcessorT<>::dispatchCall() > @ 0xee330f apache::thrift::TDispatchProcessor::process() > @ 0x1329246 > apache::thrift::server::TAcceptQueueServer::Task::run() > @ 0x1315a89 impala::ThriftThread::RunRunnable() > @ 0x131773d > boost::detail::function::void_function_obj_invoker0<>::invoke() > @ 0x195ba8c impala::Thread::SuperviseThread() > @ 0x195c895 boost::detail::thread_data<>::run() > @ 0x23a03a7 thread_proxy > @ 0x7faaad2a66ba start_thread > @ 0x7f2c151d clone > E0704 19:09:23.006968 833 catalog-server.cc:278] > 7145c21173f2c47b:2579db55] NullPointerException: null > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13037) EventsProcessorStressTest can hang
[ https://issues.apache.org/jira/browse/IMPALA-13037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840619#comment-17840619 ] Quanlong Huang commented on IMPALA-13037: - Also checked logs of hive-server2. There is an application id printed by the same thread: {noformat} 2024-04-22T20:17:56,360 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezTask: Subscribed to counters: [] for queryId: jenkins_20240422201755_96876acb-ee10-409e-a6da-bd1a9b4bc6df 2024-04-22T20:17:56,360 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezTask: Tez session hasn't been created yet. Opening session 2024-04-22T20:17:56,360 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezSessionState: User of session id d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc is jenkins 2024-04-22T20:17:56,369 INFO [HiveServer2-Background-Pool: Thread-159] tez.DagUtils: Localizing resource because it does not exist: file:/data/jenkins/workspace/impala-asf-master-exhaustive-release/repos/Impala/fe/target/dependency/postgresql-42.5.1.jar to dest: hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc-resources/postgresql-42.5.1.jar 2024-04-22T20:17:56,549 INFO [HiveServer2-Background-Pool: Thread-159] tez.DagUtils: Resource modification time: 1713842276519 for hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc-resources/postgresql-42.5.1.jar 2024-04-22T20:17:56,625 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezSessionState: Created new resources: null 2024-04-22T20:17:56,627 INFO [HiveServer2-Background-Pool: Thread-159] tez.DagUtils: Jar dir is null / directory doesn't exist. Choosing HIVE_INSTALL_DIR - /user/jenkins/.hiveJars 2024-04-22T20:17:57,105 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezSessionState: Computed sha: 77f0dcaafc28cfe7b2d805cdf2d3a083370b2299011e98eb893bd9573e3d4c10 for file: file:/data0/jenkins/workspace/impala-asf-master-exhaustive-release/Impala-Toolchain/cdp_components-45689292/apache-hive-3.1.3000.7.2.18.0-369-bin/lib/hive-exec-3.1.3000.7.2.18.0-369.jar of length: 74.73MB in 474 ms 2024-04-22T20:17:57,109 INFO [HiveServer2-Background-Pool: Thread-159] tez.DagUtils: Resource modification time: 1713837749334 for hdfs://localhost:20500/user/jenkins/.hiveJars/hive-exec-3.1.3000.7.2.18.0-369-77f0dcaafc28cfe7b2d805cdf2d3a083370b2299011e98eb893bd9573e3d4c10.jar 2024-04-22T20:17:57,227 INFO [HiveServer2-Background-Pool: Thread-159] counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=1200 2024-04-22T20:17:57,227 INFO [HiveServer2-Background-Pool: Thread-159] counters.Limits: Counter limits initialized with parameters: GROUP_NAME_MAX=256, MAX_GROUPS=500, COUNTER_NAME_MAX=64, MAX_COUNTERS=120 2024-04-22T20:17:57,227 INFO [HiveServer2-Background-Pool: Thread-159] client.TezClient: Tez Client Version: [ component=tez-api, version=0.9.1.7.2.18.0-369, revision=590a68b8a743783155fea2e6f2026f01a8775635, SCM-URL=scm:git:https://git-wip-us.apache.org/repos/asf/tez.git, buildTime=2023-09-28T12:31:39Z ] 2024-04-22T20:17:57,227 INFO [HiveServer2-Background-Pool: Thread-159] tez.TezSessionState: Opening new Tez Session (id: d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc, scratch dir: hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc) 2024-04-22T20:17:57,293 INFO [HiveServer2-Background-Pool: Thread-159] client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032 2024-04-22T20:17:57,575 INFO [HiveServer2-Background-Pool: Thread-159] client.TezClient: Session mode. Starting session. 2024-04-22T20:17:57,664 INFO [HiveServer2-Background-Pool: Thread-159] client.TezClientUtils: Ignoring 'tez.lib.uris' since 'tez.ignore.lib.uris' is set to true 2024-04-22T20:17:57,675 INFO [HiveServer2-Background-Pool: Thread-159] client.TezClient: Tez system stage directory hdfs://localhost:20500/tmp/hive/jenkins/_tez_session_dir/d6d65f07-cdff-4f5c-bbb0-b2fa24d2d1cc/.tez/application_1713840366821_0001 doesn't exist and is created 2024-04-22T20:17:57,699 INFO [HiveServer2-Background-Pool: Thread-159] conf.Configuration: resource-types.xml not found 2024-04-22T20:17:57,699 INFO [HiveServer2-Background-Pool: Thread-159] resource.ResourceUtils: Unable to find 'resource-types.xml'. 2024-04-22T20:17:57,704 INFO [HiveServer2-Background-Pool: Thread-159] common.TezYARNUtils: Ignoring 'tez.lib.uris' since 'tez.ignore.lib.uris' is set to true 2024-04-22T20:17:57,715 INFO [HiveServer2-Background-Pool: Thread-159] Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2024-04-22T20:17:58,223 INFO [HiveServer2-Background-Pool: Thread-159] impl.YarnClientImpl: Submitted application application_1713840366821_0001 2024-04-22T20:17:58,226
[jira] [Commented] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
[ https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840598#comment-17840598 ] Quanlong Huang commented on IMPALA-13034: - Yeah, IMPALA-9380 just helps the part of finalizing (unregistering) a query. The HTTP requests come when the query is still running. We need a lock to protect such read requests from concurrent modification on the profile. Probably we can add a more fine-grained lock just for reading/writing the profile and don't block client fetching query results. > Add logs for slow HTTP requests dumping the profile > --- > > Key: IMPALA-13034 > URL: https://issues.apache.org/jira/browse/IMPALA-13034 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > > There are several endpoints in WebUI that can dump a query profile: > /query_profile, /query_profile_encoded, /query_profile_plain_text, > /query_profile_json > The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() > which acquires lock of the ClientRequestState. This could blocks client > requests in fetching query results. We should add warning logs when such HTTP > requests run slow (e.g. when the profile is too large to download in a short > time). IP address and other info of such requests should also be logged. > Related codes: > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-13028) libkudu_client.so is not stripped in the DEB/RPM packages
[ https://issues.apache.org/jira/browse/IMPALA-13028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840597#comment-17840597 ] Quanlong Huang commented on IMPALA-13028: - Probably because it has lots of unused dependency: IMPALA-12955. I think it'd be nice to have it if we can reduce its size. > libkudu_client.so is not stripped in the DEB/RPM packages > - > > Key: IMPALA-13028 > URL: https://issues.apache.org/jira/browse/IMPALA-13028 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Assignee: XiangYang >Priority: Major > > The current DEB package is 611M on ubuntu18.04. Here are the top-10 largest > files: > {noformat} > 14 MB > ./opt/impala/lib/jars/hive-standalone-metastore-3.1.3000.7.2.18.0-369.jar > 15 MB ./opt/impala/lib/jars/kudu-client-e742f86f6d.jar > 20 MB ./opt/impala/lib/native/libstdc++.so.6.0.28 > 22 MB ./opt/impala/lib/jars/js-22.3.0.jar > 29 MB ./opt/impala/lib/jars/iceberg-hive-runtime-1.3.1.7.2.18.0-369.jar > 60 MB ./opt/impala/lib/jars/ozone-filesystem-hadoop3-1.3.0.7.2.18.0-369.jar > 84 MB ./opt/impala/util/impala-profile-tool > 85 MB ./opt/impala/sbin/impalad > 175 MB ./opt/impala/lib/jars/impala-minimal-s3a-aws-sdk-4.4.0-SNAPSHOT.jar > 188 MB ./opt/impala/lib/native/libkudu_client.so.0.1.0{noformat} > It appears that we just strip binaries built by Impala, e.g. impalad and > impala-profile-tool. > libkudu_client.so.0.1.0 remains the same as the one in the toolchain folder. > {code:bash} > $ ll -th > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > -rw-r--r-- 1 quanlong quanlong 189M 10月 18 2023 > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > $ file > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0 > toolchain/toolchain-packages-gcc10.4.0/kudu-e742f86f6d/release/lib/libkudu_client.so.0.1.0: > ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, > with debug_info, not stripped{code} > CC [~yx91490] [~boroknagyz] [~rizaon] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
Quanlong Huang created IMPALA-13034: --- Summary: Add logs for slow HTTP requests dumping the profile Key: IMPALA-13034 URL: https://issues.apache.org/jira/browse/IMPALA-13034 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang There are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the ClientRequestState. This could blocks client requests in fetching query results. We should add warning logs when such HTTP requests run slow (e.g. when the profile is too large to download in a short time). Related codes: https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
Quanlong Huang created IMPALA-13034: --- Summary: Add logs for slow HTTP requests dumping the profile Key: IMPALA-13034 URL: https://issues.apache.org/jira/browse/IMPALA-13034 Project: IMPALA Issue Type: Bug Components: Backend Reporter: Quanlong Huang There are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the ClientRequestState. This could blocks client requests in fetching query results. We should add warning logs when such HTTP requests run slow (e.g. when the profile is too large to download in a short time). Related codes: https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13034) Add logs for slow HTTP requests dumping the profile
[ https://issues.apache.org/jira/browse/IMPALA-13034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13034: Description: There are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the ClientRequestState. This could blocks client requests in fetching query results. We should add warning logs when such HTTP requests run slow (e.g. when the profile is too large to download in a short time). IP address and other info of such requests should also be logged. Related codes: https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 was: There are several endpoints in WebUI that can dump a query profile: /query_profile, /query_profile_encoded, /query_profile_plain_text, /query_profile_json The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() which acquires lock of the ClientRequestState. This could blocks client requests in fetching query results. We should add warning logs when such HTTP requests run slow (e.g. when the profile is too large to download in a short time). Related codes: https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 > Add logs for slow HTTP requests dumping the profile > --- > > Key: IMPALA-13034 > URL: https://issues.apache.org/jira/browse/IMPALA-13034 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Quanlong Huang >Priority: Critical > > There are several endpoints in WebUI that can dump a query profile: > /query_profile, /query_profile_encoded, /query_profile_plain_text, > /query_profile_json > The HTTP handler thread goes into ImpalaServer::GetRuntimeProfileOutput() > which acquires lock of the ClientRequestState. This could blocks client > requests in fetching query results. We should add warning logs when such HTTP > requests run slow (e.g. when the profile is too large to download in a short > time). IP address and other info of such requests should also be logged. > Related codes: > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-server.cc#L736 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-beeswax-server.cc#L601 > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/service/impala-hs2-server.cc#L207 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI
[ https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang updated IMPALA-13033: Labels: newbie ramp-up (was: ) > impala-profile-tool should support parsing thrift profiles downloaded from > WebUI > > > Key: IMPALA-13033 > URL: https://issues.apache.org/jira/browse/IMPALA-13033 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Priority: Major > Labels: newbie, ramp-up > > In the coordinator WebUI, users can download query profiles in > text/json/thrift formats. The thrift profile is the same as one line in the > profile log without the timestamp and query id at the beginning. > impala-profile-tool fails to parse such a file. It should retry parsing the > whole line as the encoded profile. Current code snipper: > {code:cpp} > // Parse out fields from the line. > istringstream liness(line); > int64_t timestamp; > string query_id, encoded_profile; > liness >> timestamp >> query_id >> encoded_profile; > if (liness.fail()) { > cerr << "Error parsing line " << lineno << ": '" << line << "'\n"; > ++errors; > continue; > }{code} > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-13033) impala-profile-tool should support parsing thrift profiles downloaded from WebUI
[ https://issues.apache.org/jira/browse/IMPALA-13033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quanlong Huang reassigned IMPALA-13033: --- Assignee: (was: Quanlong Huang) > impala-profile-tool should support parsing thrift profiles downloaded from > WebUI > > > Key: IMPALA-13033 > URL: https://issues.apache.org/jira/browse/IMPALA-13033 > Project: IMPALA > Issue Type: Bug > Components: Infrastructure >Reporter: Quanlong Huang >Priority: Major > > In the coordinator WebUI, users can download query profiles in > text/json/thrift formats. The thrift profile is the same as one line in the > profile log without the timestamp and query id at the beginning. > impala-profile-tool fails to parse such a file. It should retry parsing the > whole line as the encoded profile. Current code snipper: > {code:cpp} > // Parse out fields from the line. > istringstream liness(line); > int64_t timestamp; > string query_id, encoded_profile; > liness >> timestamp >> query_id >> encoded_profile; > if (liness.fail()) { > cerr << "Error parsing line " << lineno << ": '" << line << "'\n"; > ++errors; > continue; > }{code} > https://github.com/apache/impala/blob/f620e5d5c0bbdb0fd97bac31c7b7439cd13c6d08/be/src/util/impala-profile-tool.cc#L109 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org