Lars Volker has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering ......................................................................
Patch Set 9: (13 comments) http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 581: /* Nested types are not supported yet */ nit: use // style comment Is there a JIRA to add support for nested types? Line 661: if (enc_stat.encoding != parquet::Encoding::PLAIN_DICTIONARY) { Can the count ever be zero? The parquet.thrift file seems to allow it. Line 735: if (dict_filter_it == scanner_dict_filter_map_.end()) DCHECK(false); You could use DCHECK(a != b) http://gerrit.cloudera.org:8080/#/c/5904/5/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: Line 439: /// These are pointers into elements of column_readers_ > You can use a ScoperBuffer here, which allows the memory to be tracked. Did this slip through the cracks? http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.h File be/src/exec/hdfs-parquet-scanner.h: PS9, Line 435: into s/into/to ? Line 446: Tuple* dict_filter_tuple_; I'd initialize this when you need it, since it is just the result of a cast and doesn't take up space, but this way adds to the list of fields (and size of the header). PS9, Line 618: ( Why the (? Line 623: /// Returns true if the column chunk is 100% dictionary encoded Can you make this more concrete? For example "if all pages of the column chunk...". PS9, Line 630: row_group_eliminated Isn't the caller responsible for eliminating the row group? "_eliminated" sounds like as a side effect the elimination will be done. http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 254: map<Encoding::type, int32_t> dict_encoding_stats_; why not unordered_map? Also we I think we prefer int over int32_t http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS9, Line 499: NeedsCoversion NeedsConversion http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.h File be/src/exec/parquet-column-readers.h: PS9, Line 427: hext_header_size *next_header_size http://gerrit.cloudera.org:8080/#/c/5904/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 229: * TODO: It should be possible to use this in isConstantImpl. You could pass the function name into isDeterministic and then re-use this in isConstantImpl to resolve the todo -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Mulder <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-HasComments: Yes
