Lars Volker has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
......................................................................


Patch Set 9:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 581:   /* Nested types are not supported yet */
nit: use // style comment
Is there a JIRA to add support for nested types?


Line 661:         if (enc_stat.encoding != parquet::Encoding::PLAIN_DICTIONARY) 
{
Can the count ever be zero? The parquet.thrift file seems to allow it.


Line 735:     if (dict_filter_it == scanner_dict_filter_map_.end()) 
DCHECK(false);
You could use DCHECK(a != b)


http://gerrit.cloudera.org:8080/#/c/5904/5/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

Line 439:   /// These are pointers into elements of column_readers_
> You can use a ScoperBuffer here, which allows the memory to be tracked.
Did this slip through the cracks?


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-scanner.h
File be/src/exec/hdfs-parquet-scanner.h:

PS9, Line 435: into
s/into/to ?


Line 446:   Tuple* dict_filter_tuple_;
I'd initialize this when you need it, since it is just the result of a cast and 
doesn't take up space, but this way adds to the list of fields (and size of the 
header).


PS9, Line 618: (
Why the (?


Line 623:   /// Returns true if the column chunk is 100% dictionary encoded
Can you make this more concrete? For example "if all pages of the column 
chunk...".


PS9, Line 630: row_group_eliminated
Isn't the caller responsible for eliminating the row group? "_eliminated" 
sounds like as a side effect the elimination will be done.


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 254:   map<Encoding::type, int32_t> dict_encoding_stats_;
why not unordered_map? Also we I think we prefer int over int32_t


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS9, Line 499: NeedsCoversion
NeedsConversion


http://gerrit.cloudera.org:8080/#/c/5904/9/be/src/exec/parquet-column-readers.h
File be/src/exec/parquet-column-readers.h:

PS9, Line 427: hext_header_size
*next_header_size


http://gerrit.cloudera.org:8080/#/c/5904/9/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 229:    * TODO: It should be possible to use this in isConstantImpl.
You could pass the function name into isDeterministic and then re-use this in 
isConstantImpl to resolve the todo


-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Mulder <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-HasComments: Yes

Reply via email to