Dan Hecht has posted comments on this change.

Change subject: IMPALA-4725: Query option to control Parquet array resolution.
......................................................................


Patch Set 1: Code-Review+2

(7 comments)

Looks fine, but had a couple of suggestions that you can ignore. If you do 
implement them, I can take another look.

http://gerrit.cloudera.org:8080/#/c/6250/1//COMMIT_MSG
Commit Message:

PS1, Line 11: resoution
resolution


PS1, Line 29: resoution
resolution


http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

PS1, Line 409: 3)
what's this? the number of enum elements? consider defining NUM_ARRAY_ENCODINGS 
or something as the last element, and using that here (and see below).


Line 416:   Status one_level_status = Status::OK();
could be:

bool missing_fields[NUM_ARRAY_ENCODINGS];
Status statuses[NUM_ARRAY_ENCODINGS];

then below you can just use those directly by indexing with array_encoding. And 
use for-loops for the stuff toward the end of this function.

given that it's very unlikely we'll ever have to add another enum value here, 
i'm fine with leaving this alone as well, if that's your preference.


http://gerrit.cloudera.org:8080/#/c/6250/1/be/src/exec/parquet-metadata-utils.h
File be/src/exec/parquet-metadata-utils.h:

Line 144:     GetOrderedArrayEncodings(array_resolution_, 
&ordered_array_encodings_);
why not do this in the constructor and don't bother saving array_resolution_? 
That way, there's no state redundancy.

Alternatively, and I think even better, we could have a static const table that 
maps from array_resolution to ordered encodings list which can be used in place 
of ordered_array_encodings_.  That way again we won't have the redundancy. And 
it's better because then there's no question that the ordering is not something 
we fiddle with at runtime.

The redundancy isn't so bad here, but we generally have too much redundant 
state which makes it harder to reason about the system, so it's good to avoid 
redundant state.  Without it, for example, there's no question about what 
influences the array resolution order.


Line 190:   void GetOrderedArrayEncodings(TParquetArrayResolution::type 
array_resolution,
could be static, then there's no question that array_resolution is the only 
input.


http://gerrit.cloudera.org:8080/#/c/6250/1/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

Line 264:   // it non-default in a compatibility breaking release.
why want a jira for that, or else we'll likely forget to do it.


-- 
To view, visit http://gerrit.cloudera.org:8080/6250
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I4f32e19ec542d4d485154c9d65d0f5e3f9f0a907
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-HasComments: Yes

Reply via email to