Attila Jeges has posted comments on this change. ( http://gerrit.cloudera.org:8080/13189 )
Change subject: IMPALA-7370: DATE: Read/Write to parquet. ...................................................................... Patch Set 2: (13 comments) http://gerrit.cloudera.org:8080/#/c/13189/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/13189/2//COMMIT_MSG@12 PS2, Line 12: Parquet uses DATE logical type for dates. DATE logical type annotates : an INT32 that stores the number of days from the Unix epoch, 1 January : 1970. > Gregorian stuff + Hive could be mentioned here, and I would also prefer to Done. Added some e2e tests too. http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-readers.cc File be/src/exec/parquet/parquet-column-readers.cc: http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-readers.cc@957 PS2, Line 957: // The range was already checked during the int32_t->DateValue conversion, which : // sets the date to invalid if it was out of range. > optional: I'll leave it like this for now. We can figure out a better way later. http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-stats.inline.h File be/src/exec/parquet/parquet-column-stats.inline.h: http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-column-stats.inline.h@171 PS2, Line 171: DateValue* result = reinterpret_cast<DateValue*>(slot); : int size = buffer.size(); : const uint8_t* data = reinterpret_cast<const uint8_t*>(buffer.data()); : if (ParquetPlainEncoder::Decode<DateValue, parquet::Type::INT32>(data, data + size, : size, result) == -1) { : return false; : } > This looks very similar to line 148-155. It would be nice to extract it int Done http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-common.h File be/src/exec/parquet/parquet-common.h: http://gerrit.cloudera.org:8080/#/c/13189/2/be/src/exec/parquet/parquet-common.h@370 PS2, Line 370: i > nit: "i" suggests a loop variable to me, so I would prefer a name like tmp Done http://gerrit.cloudera.org:8080/#/c/13189/2/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: http://gerrit.cloudera.org:8080/#/c/13189/2/common/thrift/generate_error_codes.py@408 PS2, Line 408: ("PARQUET_DATE_OUT_OF_RANGE", 134, : "Parquet file '$0' column '$1' contains an out of range date. " : "The valid date range is 0000-01-01..9999-12-31."), > Can you create file with a corrupt value to cover related paths? I've added the e2e test. This test revealed a missing bit in my implementation, thanks! http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java File fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java: http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1267 PS2, Line 1267: // Right now, scanning DATE values is only supported for TEXT and PARQUET fileformats. > I feel that listing unsupported formats could be done right above findUnsup Done http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1341 PS2, Line 1341: ff != HdfsFileFormat.TEXT && ff != HdfsFileFormat.PARQUET > A function like "IsTypeSupported()" or "GetSupportedType()" could be added Done http://gerrit.cloudera.org:8080/#/c/13189/2/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java@1341 PS2, Line 1341: if (ff != HdfsFileFormat.TEXT && ff != HdfsFileFormat.PARQUET) { : return part; : } > nit: can this be merged into one line? Done http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test File testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test: http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/date-fileformat-support.test@7 PS2, Line 7: ---- QUERY > nit: Would it make sense to merge this step with the previous one as it doe Added expected results to the previous section. I think it makes more sense than merging the two test cases together. http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test File testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test: http://gerrit.cloudera.org:8080/#/c/13189/2/testdata/workloads/functional-query/queries/QueryTest/parquet-filtering.test@236 PS2, Line 236: # Note: dictionary filtering currently does not work on dates > IMPALA-4994 could be mentioned here, it would make it easier to cleanup rel Done http://gerrit.cloudera.org:8080/#/c/13189/2/tests/common/impala_connection.py File tests/common/impala_connection.py: http://gerrit.cloudera.org:8080/#/c/13189/2/tests/common/impala_connection.py@285 PS2, Line 285: r = None : try: : r = self.__fetch_results(handle, profile_format=profile_format) : finally: : if r is None: : # Try to close the query handle but ignore any exceptions not to overwrite the : # original exception raised by '__fetch_results'. : try: : self.close_query(handle) : except Exception: : pass : else: : self.close_query(handle) : return r > Is this an intentional change for "date on parquet"? Isn't this an unrelate I decided to include this fix here as it broke some of the hs2 DATE tests in test_date_queries.py. I could create a separate patch set only for this, although probably that would be an overkill. http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py File tests/query_test/test_insert_parquet.py: http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py@667 PS2, Line 667: values. > nit: Date could be mentioned in this comment + the reason for handling it s Done http://gerrit.cloudera.org:8080/#/c/13189/2/tests/query_test/test_insert_parquet.py@669 PS2, Line 669: # Expected values for functional.date_tbl > nit: for me this comment doesn't add any value Removed it. -- To view, visit http://gerrit.cloudera.org:8080/13189 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I67da03754531660bc8de3b6935580d46deae1814 Gerrit-Change-Number: 13189 Gerrit-PatchSet: 2 Gerrit-Owner: Attila Jeges <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Comment-Date: Fri, 03 May 2019 15:15:03 +0000 Gerrit-HasComments: Yes
