Dan Hecht has posted comments on this change. Change subject: IMPALA-4725: Query option to control Parquet array resolution. ......................................................................
Patch Set 1: Code-Review+2 (7 comments) Looks fine, but had a couple of suggestions that you can ignore. If you do implement them, I can take another look. http://gerrit.cloudera.org:8080/#/c/6250/1//COMMIT_MSG Commit Message: PS1, Line 11: resoution resolution PS1, Line 29: resoution resolution http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: PS1, Line 409: 3) what's this? the number of enum elements? consider defining NUM_ARRAY_ENCODINGS or something as the last element, and using that here (and see below). Line 416: Status one_level_status = Status::OK(); could be: bool missing_fields[NUM_ARRAY_ENCODINGS]; Status statuses[NUM_ARRAY_ENCODINGS]; then below you can just use those directly by indexing with array_encoding. And use for-loops for the stuff toward the end of this function. given that it's very unlikely we'll ever have to add another enum value here, i'm fine with leaving this alone as well, if that's your preference. http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.h File be/src/exec/parquet-metadata-utils.h: Line 144: GetOrderedArrayEncodings(array_resolution_, &ordered_array_encodings_); why not do this in the constructor and don't bother saving array_resolution_? That way, there's no state redundancy. Alternatively, and I think even better, we could have a static const table that maps from array_resolution to ordered encodings list which can be used in place of ordered_array_encodings_. That way again we won't have the redundancy. And it's better because then there's no question that the ordering is not something we fiddle with at runtime. The redundancy isn't so bad here, but we generally have too much redundant state which makes it harder to reason about the system, so it's good to avoid redundant state. Without it, for example, there's no question about what influences the array resolution order. Line 190: void GetOrderedArrayEncodings(TParquetArrayResolution::type array_resolution, could be static, then there's no question that array_resolution is the only input. http://gerrit.cloudera.org:8080/#/c/6250/1/common/thrift/ImpalaService.thrift File common/thrift/ImpalaService.thrift: Line 264: // it non-default in a compatibility breaking release. why want a jira for that, or else we'll likely forget to do it. -- To view, visit http://gerrit.cloudera.org:8080/6250 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I4f32e19ec542d4d485154c9d65d0f5e3f9f0a907 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-HasComments: Yes
