Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13993 )

Change subject: IMPALA-8834: Short-circuit partition key scan
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@11
PS14, Line 11: It can therefore be safely enabled by default.
> I suppose the enabling by default will be a separate patch ?
Reworded to make what this actually does clearer. This is adding a weaker 
version of the optimisation that is always enabled. The old version is still 
more efficient since it's metadata only, so is still enabled behind the flag.


http://gerrit.cloudera.org:8080/#/c/13993/14//COMMIT_MSG@55
PS14, Line 55: Testing:
> Thanks for adding the detailed tests :)
ack


http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc
File be/src/exec/hdfs-scan-node-mt.cc:

http://gerrit.cloudera.org:8080/#/c/13993/14/be/src/exec/hdfs-scan-node-mt.cc@98
PS14, Line 98:   row_batch->limit_capacity(1);
> Should this limiting be under an if(is_partition_key_scan_) check ?
Yes! This was a bug causing some of the test failures.


http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/13993/14/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1134
PS14, Line 1134:       long numRangesAdjusted = numRanges * 100;
> I can see why the adjustment would be needed  .. to ensure a distributed sc
I think it's not possible to have that many scan ranges (aside from anything 
else, it wouldn't fit in the various data structures). But I think it's a good 
pattern to use a safe multiply so I changed it.


http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test:

http://gerrit.cloudera.org:8080/#/c/13993/14/testdata/workloads/functional-planner/queries/PlannerTest/disable-codegen.test@21
PS14, Line 21:    HDFS partitions=24/24 files=24 size=478.45KB
> it wasn't obvious to me why the HDFS prefix got added based on the changes
This was a side-effect of regenerating the planner test output. The Planner 
tests are rigged to ignore this part of the plan output so the test output 
wasn't updated when these prefixes were added.



--
To view, visit http://gerrit.cloudera.org:8080/13993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c
Gerrit-Change-Number: 13993
Gerrit-PatchSet: 14
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 21 Apr 2020 19:26:46 +0000
Gerrit-HasComments: Yes

Reply via email to