Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8775 )
Change subject: IMPALA-4993: extend dictionary filtering to collections ...................................................................... Patch Set 6: (6 comments) Took a detailed look at the backend http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h@462 PS6, Line 462: std::unordered_map<const TupleDescriptor*, std::unique_ptr<ScopedBuffer>> It would make more sense at this point to use a MemPool instead of a proliferation of ScopedBuffers in dict_filter_tuple_map_ and min_max_tuple_buffer_ - it's the canonical way to make multiple small allocations that are freed at the same time. It would have made sense even before this patch. level_cache_pool_ already has the right lifetime (freed in close) so we could just rename that to something more generic that reflects the lifetime. E.g. we call something similar a "perm_pool" elsewhere because its contents have the same lifetime as the owning object. http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.h@649 PS6, Line 649: /// Gets the TupleDescriptor of slot_desc. Mention that 'slot_desc' can belong to the top-level tuple or a tuple nested in a collection? http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@a1620 PS6, Line 1620: thanks :) http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@771 PS6, Line 771: if (!col_reader->IsCollectionReader()) { nit: I think this would be easier to follow if we reversed the branches and removed the negation. http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@815 PS6, Line 815: for (auto* col_reader : column_readers_) { nit: could fit on one line now http://gerrit.cloudera.org:8080/#/c/8775/6/be/src/exec/hdfs-parquet-scanner.cc@1657 PS6, Line 1657: if (column_readers.empty()) return Status::OK(); Is the early exit necessary for correctness? Might be worth mentioning if it is. Otherwise, I don't think it matters for performance so my bias is towards leaving it out. -- To view, visit http://gerrit.cloudera.org:8080/8775 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If3a2abcfc3d0f7d18756816659fed77ce12668dd Gerrit-Change-Number: 8775 Gerrit-PatchSet: 6 Gerrit-Owner: Vuk Ercegovac <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Vuk Ercegovac <[email protected]> Gerrit-Comment-Date: Tue, 09 Jan 2018 19:23:40 +0000 Gerrit-HasComments: Yes
