Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4624: Implement Parquet dictionary filtering
......................................................................


Patch Set 14:

(7 comments)

Flushing out some comments I made while in transit.

I don't have any concerns about correctness but there were a few things that 
may be confusing for people reading the code.

http://gerrit.cloudera.org:8080/#/c/5904/14//COMMIT_MSG
Commit Message:

Line 7: IMPALA-4624: Implement Parquet dictionary filtering
Can you mention the query option in the commit message?


PS14, Line 12: incides
indices


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 704:     return Status(Substitute("Could not allocate buffer of $0 bytes 
for Parquet "
Can you use MemTracker::MemLimitExceeded() here to construct the status? It 
also does some logging that can be useful to diagnose the failure.


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 855
Thanks for fixing this. I was talking to Wes McKinney (who works on 
parquet-cpp) a month or so ago and he'd been confused about why Impala was 
writing out encodings it wasn't using.


http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/parquet-column-readers.cc
File be/src/exec/parquet-column-readers.cc:

PS14, Line 256: GetDictionary
GetDictionaryDecoder() for consistency with the other APIs?


Line 865:     if (!stream_->ReadBytes(data_size, &data_, &status)) return 
status;
Not your change, but can we just SkipBytes here? That would make the intent 
clearer.


http://gerrit.cloudera.org:8080/#/c/5904/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java
File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java:

Line 243:   public boolean isDeterministic() {
I think we should be careful about the naming and comments here, since this 
will return true for many non-deterministic functions - the state of things 
before this patch is pretty confusing. The definition is Expr.isConstant() is 
subtle - the comment on that function tries to define the current rules.

E.g. UDFs can be non-deterministic but the fe treats them as deterministic (for 
now). Or now() isn't really deterministic, but we treat it as such because it 
shouldn't be re-evaluated within a query.

This list of functions is really the builtin functions that have some kind of 
randomisation. Can you rename it to something like isRandomizedBuiltin() and 
update the comment to reflect that?


-- 
To view, visit http://gerrit.cloudera.org:8080/5904
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Joe McDonnell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Marcel Kornacker <[email protected]>
Gerrit-Reviewer: Matthew Mulder <[email protected]>
Gerrit-Reviewer: Mostafa Mokhtar <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to