Gabor Kaszab has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11984 )

Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from 
Parquet
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h
File be/src/exec/parquet-common.h:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479
PS2, Line 479: LimitedRange
Dows this "LimitedRange" part of the name add extra value? Am I right that we 
don't have any other kind of "Timestamp with Nano precision"? Then we can refer 
to this implementation simply as UnixTimeNanos in my opinion.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507
PS2, Line 507:   /// conversion is necessary. Returns true if the schema 
describes a valid timestamp
Could you please comment what the parameters are for, pls?


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509
PS2, Line 509: ProcessSchema
I feel that something like GetTimestampInfoFromSchema might be a bit more 
self-descriptive. There might be even better names :)


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc
File be/src/exec/parquet-common.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102
PS2, Line 102:   // Timestamps can be only encoded as INT64 or INT96.
nit: You might want to adjust this comment to cover the case when this function 
is called on non-timestamp related types e.g. string.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106
PS2, Line 106: Newer solution,
nit: No need for this part. In addition, it's not guaranteed that reading this 
comment 2 years from now would reveal what "Newer solution" is referred to.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122
PS2, Line 122: Older solution,
No need for "Older solution" part of the comment.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123
PS2, Line 123: are/were never
nit: I think we shouldn't refer to what happened before this change as we 
should only talk about the current code not the previous code.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148
PS2, Line 148:   if (e.type == parquet::Type::INT96 && 
convert_int96_timestamps) needs_conversion = true;
Shouldn't this line be also included in ProcessSchema? Since you introduced a 
function to set 'needs_conversion' I think it should tackle all the cases not 
just a subset of them.


http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc
File be/src/exec/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71
PS2, Line 71:   return ParquetTimestampDecoder::ProcessSchema(element, 
precision, needs_conversion);
Just thinking out loud here: This ProcessSchema() is now used for 1) checking 
if the timestamp in the referred element is a valid timestamp and 2) to gather 
information such as 'populate' and 'precision'. Have you considered splitting 
these 2 functionalities?



--
To view, visit http://gerrit.cloudera.org:8080/11984
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f
Gerrit-Change-Number: 11984
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Tue, 27 Nov 2018 14:17:17 +0000
Gerrit-HasComments: Yes

Reply via email to