[ https://issues.apache.org/jira/browse/IMPALA-8912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-8912. ------------------------------------ Fix Version/s: Impala 3.4.0 Resolution: Fixed > Avoid calling computeStats twice on HBaseScanNode > ------------------------------------------------- > > Key: IMPALA-8912 > URL: https://issues.apache.org/jira/browse/IMPALA-8912 > Project: IMPALA > Issue Type: Improvement > Components: Frontend > Affects Versions: Impala 2.8.0, Impala 2.9.0, Impala 2.10.0, Impala > 2.11.0, Impala 3.0, Impala 2.12.0, Impala 3.1.0, Impala 3.2.0, Impala 3.3.0 > Reporter: Quanlong Huang > Assignee: Quanlong Huang > Priority: Major > Fix For: Impala 3.4.0 > > > For simple queries on HBase tables that has HBaseScanNode as the root of the > SingleNodePlan, HBaseScanNode#computeStats will be called twice. > Stacktrace for the first call: > {code:java} > at > org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286) > at > org.apache.impala.planner.HBaseScanNode.init(HBaseScanNode.java:160) > at > org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1405) > at > org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1582) > at > org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:826) > at > org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:662) > at > org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:261) > at > org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151) > at org.apache.impala.planner.Planner.createPlan(Planner.java:117) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220) > {code} > Stacktrace for the second call: > {code:java} > at > org.apache.impala.planner.HBaseScanNode.computeStats(HBaseScanNode.java:286) > at > org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:307) > at > org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:151) > at org.apache.impala.planner.Planner.createPlan(Planner.java:117) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1169) > at > org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1495) > at > org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1359) > at > org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1250) > at > org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1220) > at > org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:154) > {code} > Codes of the second call: > {code:java} > private PlanNode createQueryPlan(QueryStmt stmt, Analyzer analyzer, boolean > disableTopN) > throws ImpalaException { > ...... > if (stmt.evaluateOrderBy() && sortHasMaterializedSlots) { > root = createSortNode(analyzer, root, stmt.getSortInfo(), > stmt.getLimit(), > stmt.getOffset(), stmt.hasLimit(), disableTopN); > } else { > root.setLimit(stmt.getLimit()); > root.computeStats(analyzer); // <--- May call > HBaseScanNode#computeStats here > } > return root; > } > {code} > Logs for a simple query on an old version of Impala: > {code:java} > I0830 11:52:05.991547 41189 Analyzer.java:1578] new pred: stg.xxx_hbase.key > >= 'key1' BinaryPredicate{op=>=, SlotRef{path=key, type=STRING, id=0} > StringLiteral{value=key1}} > I0830 11:52:05.991595 41189 Analyzer.java:1578] new pred: stg.xxx_hbase.key > <= 'key2' BinaryPredicate{op=<=, SlotRef{path=key, type=STRING, id=0} > StringLiteral{value=key2}} > # <--------- 2 seconds here > I0830 11:52:08.114225 41189 HBaseScanNode.java:217] computeStats HbaseScan: > cardinality=1706076 > I0830 11:52:08.114341 41189 HBaseScanNode.java:223] computeStats HbaseScan: > #nodes=100 > I0830 11:52:08.114452 41189 SingleNodePlanner.java:357] createCheapestJoinPlan > # <--------- 2 seconds here > I0830 11:52:10.260190 41189 HBaseScanNode.java:217] computeStats HbaseScan: > cardinality=1706076 > I0830 11:52:10.260303 41189 HBaseScanNode.java:223] computeStats HbaseScan: > #nodes=100 > I0830 11:52:10.260387 41189 SingleNodePlanner.java:357] createCheapestJoinPlan > {code} > Such kind of queries are usually point queries and are always expected to > return fast. HBaseScanNode#computeStats is heavy since it requires RPCs to > HBase. We should avoid calling it twice. -- This message was sent by Atlassian Jira (v8.3.2#803003) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org