Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/11984 )
Change subject: IMPALA-7853: Add support to read int64 NANO timestamps from Parquet ...................................................................... Patch Set 2: (9 comments) http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h File be/src/exec/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@479 PS2, Line 479: LimitedRange Dows this "LimitedRange" part of the name add extra value? Am I right that we don't have any other kind of "Timestamp with Nano precision"? Then we can refer to this implementation simply as UnixTimeNanos in my opinion. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@507 PS2, Line 507: /// conversion is necessary. Returns true if the schema describes a valid timestamp Could you please comment what the parameters are for, pls? http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.h@509 PS2, Line 509: ProcessSchema I feel that something like GetTimestampInfoFromSchema might be a bit more self-descriptive. There might be even better names :) http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc File be/src/exec/parquet-common.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@102 PS2, Line 102: // Timestamps can be only encoded as INT64 or INT96. nit: You might want to adjust this comment to cover the case when this function is called on non-timestamp related types e.g. string. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@106 PS2, Line 106: Newer solution, nit: No need for this part. In addition, it's not guaranteed that reading this comment 2 years from now would reveal what "Newer solution" is referred to. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@122 PS2, Line 122: Older solution, No need for "Older solution" part of the comment. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@123 PS2, Line 123: are/were never nit: I think we shouldn't refer to what happened before this change as we should only talk about the current code not the previous code. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-common.cc@148 PS2, Line 148: if (e.type == parquet::Type::INT96 && convert_int96_timestamps) needs_conversion = true; Shouldn't this line be also included in ProcessSchema? Since you introduced a function to set 'needs_conversion' I think it should tackle all the cases not just a subset of them. http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc File be/src/exec/parquet-metadata-utils.cc: http://gerrit.cloudera.org:8080/#/c/11984/2/be/src/exec/parquet-metadata-utils.cc@71 PS2, Line 71: return ParquetTimestampDecoder::ProcessSchema(element, precision, needs_conversion); Just thinking out loud here: This ProcessSchema() is now used for 1) checking if the timestamp in the referred element is a valid timestamp and 2) to gather information such as 'populate' and 'precision'. Have you considered splitting these 2 functionalities? -- To view, visit http://gerrit.cloudera.org:8080/11984 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I932396d8646f43c0b9ca4a6359f164c4d8349d8f Gerrit-Change-Number: 11984 Gerrit-PatchSet: 2 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Tue, 27 Nov 2018 14:17:17 +0000 Gerrit-HasComments: Yes
