Michael Smith has posted comments on this change. ( http://gerrit.cloudera.org:8080/21754 )
Change subject: IMPALA-12908: Add correctness check for tuple cache ...................................................................... Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc@200 PS1, Line 200: cache[str].setCounts(1, 0); Couldn't this just be cache[str].reference_rows_count = 1; The default constructor has already been called. The setCounts method seems unnecessary. http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc@216 PS1, Line 216: if (it.second.reference_rows_count <= 0) { This should be impossible. Also both counts are uint64_t, so <0 is meaningless. Maybe just DCHECK that reference_rows_count > 0? http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc@260 PS1, Line 260: DCHECK(result == 0); It's possible for external interaction to cause an error (another process deleting the file or changing permissions). So on error I'd argue this should be LOG(ERROR). Same with the else path. http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc@351 PS1, Line 351: if (*eos) *eos = debug_eos; Can you add a comment about when this would produce different behavior? I think the answer is if the cache entry has no more rows but the child has additional rows, we need to keep reading from the child. I'd just like to document that. Would the next iteration work correctly? Presumably trying to read from the reader_ again would produce an error? http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-cache-node.cc@448 PS1, Line 448: // TODO: The name of the subdirectory could be simplified to hash key only if the key I think this should be true now, based on scan ranges. http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-text-file-reader.h File be/src/exec/tuple-text-file-reader.h: http://gerrit.cloudera.org:8080/#/c/21754/1/be/src/exec/tuple-text-file-reader.h@44 PS1, Line 44: // Returns the size of the file in bytes. Should mention that this leaves the reader at the end of the file. Or update GetFileSize to reset reader_ to its state prior to the call. -- To view, visit http://gerrit.cloudera.org:8080/21754 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied074e274ebf99fb57e3ee41a13148725775b77c Gerrit-Change-Number: 21754 Gerrit-PatchSet: 1 Gerrit-Owner: Yida Wu <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Kurt Deschler <[email protected]> Gerrit-Reviewer: Michael Smith <[email protected]> Gerrit-Comment-Date: Tue, 10 Sep 2024 21:33:19 +0000 Gerrit-HasComments: Yes
