Bharath Vissapragada has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12221


Change subject: [PROTOTYPE] IMPALA-5872: Test case builder for query planner
......................................................................

[PROTOTYPE] IMPALA-5872: Test case builder for query planner

This patch implements a new "test case" builder for simulating
query plans from one cluster on a different cluster/minicluster with
different number of nodes.

A "test case" in the context of this patch is a single file that includes
all the information that is needed to reproduce the query plan of a
given query statement. The typical workflow is like.

1) Collect the testcase of a given QueryStmt in cluster A.
2) Copy the testcase output file to cluster B.
3) Load the testcase on cluster B.
4) Run the explain <query> to make sure the plan matches (including
number of hosts).

Motivation:
----------
- Make query planner issues more debuggable
- Improve user experience while collecting query diagnostics
- Make it easy to test new planner features by testing it on customer
  usecases collected from much larger clusters.

Caveats:
------
- The tool does not collect actual data files for the tables. Only the
  metadata state is dumped.
- Currently only imports databases/tables/views. We can extend it to
  work for UDFS etc.
- It only works for QueryStmts (select/union queries)
- Once the metadata dump is loaded on a target cluster, the state is
  volatile. Hence it cannot survive a cluster restart / invalidate
  metadata
- Loading a testcase requires setting the query option (SET
  PLANNER_DEBUG_MODE=true) so that the planner knows to fake the number
  of hosts. Otherwise it takes into account the local cluster topology.

This patch adds two new SQL queries:
(full end-to-end example in gerrit comments)

For exporting a testcase:
-------------------------

EXPORT TESTCASE INTO OUTFILE '<hdfs dir>' <query stmt>;
<outputs the testcase file path>

For loading a testcase:
----------------------

SET PLANNER_DEBUG_MODE=true;
LOAD TESTCASE FROM '<testcase output path>'

How it works?
------------

- During export on the source cluster, the command dumps all the thrift states 
of
  referenced objects in the query into a gzipped binary file.
- During load on a target cluster, it adds these objects to the catalog
  cache by faking them as DDLs.
- The planner also fakes the number of hosts by using the scan range
  information from the target cluster.

** The patch is just meant to be a prototype to gather some initial
feedback. It needs much more polish to be review-ready (comments,
logic refactor, unit/e-e tests)

Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
---
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/CatalogService.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/JniCatalog.thrift
M common/thrift/Types.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
A fe/src/main/java/org/apache/impala/analysis/ExportTestCaseStmt.java
A fe/src/main/java/org/apache/impala/analysis/LoadTestCaseStmt.java
M fe/src/main/java/org/apache/impala/analysis/StmtRewriter.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/common/FileSystemUtil.java
M fe/src/main/java/org/apache/impala/common/JniUtil.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/main/jflex/sql-scanner.flex
24 files changed, 453 insertions(+), 28 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/12221/1
--
To view, visit http://gerrit.cloudera.org:8080/12221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Iec83eeb2dc5136768b70ed581fb8d3ed0335cb52
Gerrit-Change-Number: 12221
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada <[email protected]>

Reply via email to