[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..


Patch Set 4:

(5 comments)

Some minor cleanup and tests fixes

http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/AggregationNode.java
File fe/src/main/java/org/apache/impala/planner/AggregationNode.java:

Line 51:   // Default per-host memory requirement used if no valid stats are 
available.
> Need to fix comment.
Done


http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
File fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java:

Line 85: long perInstanceInputCardinality = Math.max(1L, 
inputNode.getCardinality() / numInstances);
> Need to fix long lines
Done


Line 138: output.append(PrintUtils.printInstances(" ", 
fragment_.getNumInstances(queryOptions)));
> Need to fix long line
Done


http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
File fe/src/main/java/org/apache/impala/planner/KuduTableSink.java:

Line 62:   output.append(PrintUtils.printInstances(" ", 
fragment_.getNumInstances(queryOptions)));
> need to fix long line
Done


http://gerrit.cloudera.org:8080/#/c/5847/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

Line 213: return dop * planRoot_.getNumNodes();
> Needs to handle when getNumNodes() is invalid, i.e. -1
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-08 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..

IMPALA-3748: Part 1: Clean up resource estimation in planner

This is in preparation to use this code to compute the minimum
reservation.

The cleanup restructures the code slightly so that it's clearer whether
resource estimates are valid or invalid. It also removes the unused
VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* The logic does not work for mt_dop > 1 because it conflates fragment
  instances with hosts. Fixing this requires identifying places where
  we want per-instance estimates instead of per-host.

I left one bug unfixed because it is subtle and will be easier to review
in isolation: IMPALA-4862.

There is some remaining questionable behaviour I left untouched:
* It's unclear why unpartitioned fragments are always excluded from
  total memory consumption.
* Many operators do not have estimates or have questionable heuristics.

Testing:
Re-enabled the explain_level tests, which appear to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests and a number of planner test cases for
memory estimates in both the MT and non-MT cases.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M docs/shared/impala_common.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_explain_level.xml
M docs/topics/impala_explain_plan.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_optimize_partition_key_scans.xml
M docs/topics/impala_perf_joins.xml
M docs/topics/impala_perf_stats.xml
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M tests/custom_cluster/test_admission_controller.py
M tests/metadata/test_explain.py
47 files changed, 1,544 insertions(+), 1,454 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/5
-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-Reviewer: 

[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-07 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..

IMPALA-3748: Part 1: Clean up resource estimation in planner

This is in preparation to use this code to compute the minimum
reservation.

The cleanup restructures the code slightly so that it's clearer whether
resource estimates are valid or invalid. It also removes the unused
VCore estimates.

Fixes various bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading
* The logic does not work for mt_dop > 1 because it conflates fragment
  instances with hosts. Fixing this requires identifying places where
  we want per-instance estimates instead of per-host.

I left one bug unfixed because it is subtle and will be easier to review
in isolation: IMPALA-4862.

There is some remaining questionable behaviour I left untouched:
* It's unclear why unpartitioned fragments are always excluded from
  total memory consumption.
* Many operators do not have estimates or have questionable heuristics.

Testing:
Re-enabled the explain_level tests, which appear to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests and a number of planner test cases for
memory estimates in both the MT and non-MT cases.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M docs/shared/impala_common.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_explain_level.xml
M docs/topics/impala_explain_plan.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_optimize_partition_key_scans.xml
M docs/topics/impala_perf_joins.xml
M docs/topics/impala_perf_stats.xml
M fe/src/main/java/org/apache/impala/common/PrintUtils.java
M fe/src/main/java/org/apache/impala/planner/AggregationNode.java
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/DataSink.java
M fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/EmptySetNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseScanNode.java
M fe/src/main/java/org/apache/impala/planner/HBaseTableSink.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanFragment.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/PlanRootSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/SortNode.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M tests/metadata/test_explain.py
46 files changed, 1,535 insertions(+), 1,452 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/4
-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 


[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-06 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..


Patch Set 3:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/5847/3//COMMIT_MSG
Commit Message:

Line 28: * Many operators do not have estimates or have questionable heuristics.
are you planning on addressing this particular point in a follow-up? the 
estimates will now matter if they are used for reservations, and really bad 
estimates will make the system less usable.


http://gerrit.cloudera.org:8080/#/c/5847/3/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
File fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java:

Line 249: perHostMemCost_ = 0L;
does this make any difference in java?


http://gerrit.cloudera.org:8080/#/c/5847/3/fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
File fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java:

Line 69:* Computes the estimated per-host memory and CPU requirements of 
this plan node set.
remove references to cpu requirements


Line 105: perHostHdfsScanMem += node.getPerHostMemCost();
this is incorrect for mt execution, because it extrapolates the number of 
executing threads from the number of cores. please fix this in hdfsscannode.


Line 116: Math.min(perHostHbaseScanMem, 
HBaseScanNode.getPerHostMemUpperBound());
same here


Line 123:   sink.computeCosts();
this is also wrong, because it doesn't take sorting into account.

you need to make sure the estimates are at least in the ballpark, otherwise we 
start rejecting queries left and right that used to run.


Line 197: // TODO: can overestimate resource consumption of UnionNodes - 
the execution of union
this needs to get fixed. see earlier comment.


Line 205:* Estimate of the peak resources that will be consumed by a set of 
plan nodes.
move above where it's used


Line 209: private final boolean valid_;
prefixing boolean variables with 'is-" is helpful


Line 214:   this.valid_ = valid;
remove this_


http://gerrit.cloudera.org:8080/#/c/5847/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 344:* Estimates the per-host memory and CPU requirements for the given 
plan fragments,
remove references to cpu requirements (here and elsewhere)


Line 375: if (isTrivialCoordOnlyPlan(fragments)) {
check early?


Line 381: // excludeUnpartitionedFragments is true.
mysterious


http://gerrit.cloudera.org:8080/#/c/5847/3/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 369: options.setNum_scanner_threads(1); // Required so that output 
doesn't vary by machine
please also add tests for mt execution


http://gerrit.cloudera.org:8080/#/c/5847/3/fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
File fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java:

Line 413: ignoreExplainHeader, errorLog, actualOutput);
odd indentation


Line 420: actualOutput);
odd indentation, please fix throughout


http://gerrit.cloudera.org:8080/#/c/5847/3/testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test:

Line 46: |  hosts=3 per-host-mem=160B
we need to impose a meaningful minimum, such as the minimal mempool size


Line 52: |  hosts=3 per-host-mem=128.00MB
where did that come from?


http://gerrit.cloudera.org:8080/#/c/5847/3/testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test:

Line 120: |  hosts=3 per-host-mem=10.00MB
this should express the minimum required to operate at all, rather than "how 
much would i like to get"


-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Matthew Jacobs 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-3748: Part 1: Clean up resource estimation in planner

2017-02-01 Thread Tim Armstrong (Code Review)
Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-3748: Part 1: Clean up resource estimation in planner
..

IMPALA-3748: Part 1: Clean up resource estimation in planner

This is in preparation to use this code to compute the minimum
reservation.

The cleanup restructures the code slightly so that it's clearer whether
resource estimates are valid or invalid. It also removes the unused
VCore estimates.

Fixes a couple of bugs and other issues:
* computeCosts() was not called for unpartitioned fragments, so
  the per-operator memory estimate was not visible.
* Nested loop join was not treated as a blocking join.
* The TODO comment about union was misleading

I left one bug unfixed because it is subtle and will be easier to review
in isolation: IMPALA-4862.

There is some remaining questionable behaviour I left untouched:
* It's unclear why unpartitioned fragments are always excluded from
  total memory consumption.
* Many operators do not have estimates or have questionable heuristics.

Testing:
Re-enabled the explain_level tests, which appear to be the only
coverage for many of these estimates. Removed the complex and
brittle test cases and replaced with a couple of much simpler
end-to-end tests and a number of planner test cases for
memory estimates.

Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
---
M be/src/service/query-exec-state.cc
M common/thrift/Frontend.thrift
M docs/shared/impala_common.xml
M docs/topics/impala_explain.xml
M docs/topics/impala_explain_level.xml
M docs/topics/impala_explain_plan.xml
M docs/topics/impala_hbase.xml
M docs/topics/impala_known_issues.xml
M docs/topics/impala_optimize_partition_key_scans.xml
M docs/topics/impala_perf_joins.xml
M docs/topics/impala_perf_stats.xml
M fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/main/java/org/apache/impala/planner/PipelinedPlanNodeSet.java
M fe/src/main/java/org/apache/impala/planner/PlanNode.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M 
testdata/workloads/functional-planner/queries/PlannerTest/kudu-selectivity.test
M 
testdata/workloads/functional-planner/queries/PlannerTest/mt-dop-validation.test
A 
testdata/workloads/functional-planner/queries/PlannerTest/resource-estimates.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level0.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level2.test
M testdata/workloads/functional-query/queries/QueryTest/explain-level3.test
M tests/metadata/test_explain.py
29 files changed, 810 insertions(+), 1,231 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/5847/3
-- 
To view, visit http://gerrit.cloudera.org:8080/5847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1e358182bcf2bc5fe5c73883eb97878735b12d37
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong