Joe McDonnell has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21541 )

Change subject: IMPALA-12906: Incorporate scan range information into the tuple 
cache key
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21541/1/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/21541/1/be/src/exec/hdfs-scan-node-base.cc@171
PS1, Line 171:     deterministic_scanrange_assignment_ =
> It's a little silly we copy all these values when tnode is preserved in Pla
Good point, dropped this extra field


http://gerrit.cloudera.org:8080/#/c/21541/1/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21541/1/be/src/exec/tuple-cache-node.h@62
PS1, Line 62:   const std::vector<int32_t> input_scan_node_ids_;
> nit: Could these be references to the tnode_ data? Or does that have a diff
Changed this to drop input_scan_node_ids_ and compile_time_key_ as fields on 
this class. The code can just refer to the field on the plan node directly.


http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1959
PS1, Line 1959:     if (!serialCtx.isTupleCache()) {
> I don't understand this conditional.
I added a comment here. Basically, for computing the tuple cache key, this 
doesn't add any information.


http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java:

http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java@125
PS1, Line 125:     return output.toString();
> Should we print input scan nodes here?
Good idea, added that here


http://gerrit.cloudera.org:8080/#/c/21541/1/tests/custom_cluster/test_tuple_cache.py
File tests/custom_cluster/test_tuple_cache.py:

http://gerrit.cloudera.org:8080/#/c/21541/1/tests/custom_cluster/test_tuple_cache.py@196
PS1, Line 196:     for mt_dop in [0, 1]:
> Could this be done with @pytest.mark.parametrize instead?
Added a base class for these tests and pulled these tests out into their own 
class. This now has mt_dop as a test dimension.


http://gerrit.cloudera.org:8080/#/c/21541/1/tests/custom_cluster/test_tuple_cache.py@287
PS1, Line 287:   def test_scan_range_distributed(self, vector, unique_database):
> All these tests appear to rely entirely on the runtime profile. Can we also
I added checks to make sure the results overlap properly for each of these 
tests.

I added some checks for the profile metrics for the single-node tests.

I also added checks for the number of entries in the cache at a daemon level 
for the distributed test.



--
To view, visit http://gerrit.cloudera.org:8080/21541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34
Gerrit-Change-Number: 21541
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Kurt Deschler <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Yida Wu <[email protected]>
Gerrit-Comment-Date: Fri, 28 Jun 2024 23:39:50 +0000
Gerrit-HasComments: Yes

Reply via email to