Tim Armstrong has posted comments on this change. Change subject: IMPALA-4624: Implement Parquet dictionary filtering ......................................................................
Patch Set 14: (7 comments) Flushing out some comments I made while in transit. I don't have any concerns about correctness but there were a few things that may be confusing for people reading the code. http://gerrit.cloudera.org:8080/#/c/5904/14//COMMIT_MSG Commit Message: Line 7: IMPALA-4624: Implement Parquet dictionary filtering Can you mention the query option in the commit message? PS14, Line 12: incides indices http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 704: return Status(Substitute("Could not allocate buffer of $0 bytes for Parquet " Can you use MemTracker::MemLimitExceeded() here to construct the status? It also does some logging that can be useful to diagnose the failure. http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/hdfs-parquet-table-writer.cc File be/src/exec/hdfs-parquet-table-writer.cc: Line 855 Thanks for fixing this. I was talking to Wes McKinney (who works on parquet-cpp) a month or so ago and he'd been confused about why Impala was writing out encodings it wasn't using. http://gerrit.cloudera.org:8080/#/c/5904/14/be/src/exec/parquet-column-readers.cc File be/src/exec/parquet-column-readers.cc: PS14, Line 256: GetDictionary GetDictionaryDecoder() for consistency with the other APIs? Line 865: if (!stream_->ReadBytes(data_size, &data_, &status)) return status; Not your change, but can we just SkipBytes here? That would make the intent clearer. http://gerrit.cloudera.org:8080/#/c/5904/14/fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java File fe/src/main/java/org/apache/impala/analysis/FunctionCallExpr.java: Line 243: public boolean isDeterministic() { I think we should be careful about the naming and comments here, since this will return true for many non-deterministic functions - the state of things before this patch is pretty confusing. The definition is Expr.isConstant() is subtle - the comment on that function tries to define the current rules. E.g. UDFs can be non-deterministic but the fe treats them as deterministic (for now). Or now() isn't really deterministic, but we treat it as such because it shouldn't be re-evaluated within a query. This list of functions is really the builtin functions that have some kind of randomisation. Can you rename it to something like isRandomizedBuiltin() and update the comment to reflect that? -- To view, visit http://gerrit.cloudera.org:8080/5904 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I3a7cc3bd0523fbf3c79bd924219e909ef671cfd7 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Joe McDonnell <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell <[email protected]> Gerrit-Reviewer: Lars Volker <[email protected]> Gerrit-Reviewer: Marcel Kornacker <[email protected]> Gerrit-Reviewer: Matthew Mulder <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
