Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13993 )

Change subject: IMPALA-8834: Short-circuit partition key scan
......................................................................


Patch Set 16:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13993/16//COMMIT_MSG@17
PS16, Line 17: THe
> nit: The
Done


http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/13993/16/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1135
PS16, Line 1135: 100
> nit: should this magic number be replaced with analyzer.getQueryOptions().e
This is a different value - it's an estimate of the relative cost of opening a 
file vs reading a single row. I factored it out into a separate constant. I 
didn't necessarily want to add another tuning parameter.


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test
File testdata/workloads/functional-planner/queries/PlannerTest/distinct.test:

http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-planner/queries/PlannerTest/distinct.test@526
PS16, Line 526:    partition key scan
              :    row-size=0B cardinality=24
> hmm... Did you run the test with some extract options?
This is what I get locally on my branch.

[localhost.EXAMPLE.COM:21000] default> set num_nodes=1; explain select 1 from
  (select count(distinct 1) x from functional.alltypes) t
where t.x is not null;
NUM_NODES set to 1
Query: explain select 1 from
  (select count(distinct 1) x from functional.alltypes) t
where t.x is not null
+------------------------------------------------------------+
| Explain String                                             |
+------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=1.97MB Threads=2 |
| Per-Host Resource Estimates: Memory=138MB                  |
| Codegen disabled by planner                                |
|                                                            |
| PLAN-ROOT SINK                                             |
| |                                                          |
| 02:AGGREGATE [FINALIZE]                                    |
| |  output: count(1)                                        |
| |  having: count(1) IS NOT NULL                            |
| |  row-size=8B cardinality=1                               |
| |                                                          |
| 01:AGGREGATE                                               |
| |  group by: 1                                             |
| |  row-size=1B cardinality=1                               |
| |                                                          |
| 00:SCAN HDFS [functional.alltypes]                         |
|    HDFS partitions=24/24 files=24 size=478.45KB            |
|    partition key scan                                      |
|    row-size=0B cardinality=24                              |
+------------------------------------------------------------+


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test
File 
testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test:

http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@13
PS16, Line 13: aggregation(SUM, RowsReturned): 48
> Are we checking 48 == 2 * NumFiles? If so, how does the factor (2) come? Sh
It's actually the total across all the operators in the plans. I actually 
realised that this assumes a 3-node minicluster, so I pulled the RowsReturned 
check into a separate test that only runs in the HDFS minicluster configuration.

  
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+---------------------+
  | Operator            | #Hosts | #Inst | Avg Time | Max Time | #Rows | Est. 
#Rows | Peak Mem  | Est. Peak Mem | Detail              |
  
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+---------------------+
  | F02:ROOT            | 1      | 1     | 31.57us  | 31.57us  |       |        
    | 0 B       | 0 B           |                     |
  | 04:EXCHANGE         | 1      | 1     | 44.26us  | 44.26us  | 2     | 2      
    | 24.00 KB  | 16.00 KB      | UNPARTITIONED       |
  | F01:EXCHANGE SENDER | 3      | 3     | 88.39us  | 132.93us |       |        
    | 25.59 KB  | 0 B           |                     |
  | 03:AGGREGATE        | 3      | 3     | 1.66ms   | 1.89ms   | 2     | 2      
    | 1.95 MB   | 10.00 MB      | FINALIZE            |
  | 02:EXCHANGE         | 3      | 3     | 26.16us  | 48.34us  | 6     | 2      
    | 48.00 KB  | 16.00 KB      | HASH(`year`)        |
  | F00:EXCHANGE SENDER | 3      | 3     | 158.73us | 169.28us |       |        
    | 80.78 KB  | 0 B           |                     |
  | 01:AGGREGATE        | 3      | 3     | 1.19ms   | 1.25ms   | 6     | 2      
    | 2.02 MB   | 10.00 MB      | STREAMING           |
  | 00:SCAN HDFS        | 3      | 3     | 2.75ms   | 2.92ms   | 24    | 24     
    | 364.00 KB | 128.00 MB     | functional.alltypes |
  
+---------------------+--------+-------+----------+----------+-------+------------+-----------+---------------+---------------------+


http://gerrit.cloudera.org:8080/#/c/13993/16/testdata/workloads/functional-query/queries/QueryTest/partition-key-scans.test@30
PS16, Line 30: order by year, month
> nit: I think we don't need the ORDER BY clause. The test framework will sor
Done


http://gerrit.cloudera.org:8080/#/c/13993/16/tests/query_test/test_queries.py
File tests/query_test/test_queries.py:

http://gerrit.cloudera.org:8080/#/c/13993/16/tests/query_test/test_queries.py@268
PS16, Line 268:     if vector.get_value('exec_option')['mt_dop'] != 0:
I can update this after rebasing



--
To view, visit http://gerrit.cloudera.org:8080/13993
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I26c87525a4f75ffeb654267b89948653b2e1ff8c
Gerrit-Change-Number: 13993
Gerrit-PatchSet: 16
Gerrit-Owner: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Aman Sinha <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Tue, 19 May 2020 17:17:35 +0000
Gerrit-HasComments: Yes

Reply via email to