Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/15222 )
Change subject: IMPALA-9385: Unix time conversion cleanup + ORC fix ...................................................................... Patch Set 6: (10 comments) Thanks for the review! I fixed the ones I were sure about, will return to some of the more complex ones. http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@13 PS6, Line 13: is was > was no ? Done http://gerrit.cloudera.org:8080/#/c/15222/6//COMMIT_MSG@26 PS6, Line 26: that > nit: not necessary Done http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h File be/src/common/global-types.h: http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/common/global-types.h@31 PS6, Line 31: #define UTCPTR nullptr > Why no define a const Timezone* instead? It led to compile error for some reason. I will try it again, probably it should be const Timezone * const http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc File be/src/exec/data-source-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/15222/7/be/src/exec/data-source-scan-node.cc@352 PS7, Line 352: // TODO The timezone depends on flag use_local_tz_for_unix_timestamp_conversions. : // Check if this is the intended behaviour. : RETURN_IF_ERROR(MaterializeNextRow( : state->time_zone_for_unix_time_conversions(), tuple_pool, tuple)); > You're raising a good point in the comment. I think here we should just pas I was thinking about UTCPTR instead. Using local_time_zone() would mean that we will do a timezone conversion by default, which is not how Impala used to work. Note that I am not sure whether anyone cares about the correctness of DataSource. http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc File be/src/exprs/expr-test.cc: http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/exprs/expr-test.cc@170 PS6, Line 170: const Timezone& > Returning const Timezone* here might simplify things a bit. This cannot return nullptr, as it is also used in UtcToLocal() which expects a Timezone reference, so I think that it is clearer to keep it this way. http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h File be/src/runtime/timestamp-value.h: http://gerrit.cloudera.org:8080/#/c/15222/6/be/src/runtime/timestamp-value.h@102 PS6, Line 102: 'unix_time' is assumed to be UTC > nit: The comment is a bit confusing. 'unix_time' is always in UTC (by defin I didn't find a good name for this offset when used in timezone agnostic context. E.g. both Parquet and Orc store such offsets that behave exactly like Unix time, while assume the resulting timestamp to be in local time. We consider this the "normal" case and pass a pointer to a Timezone when the unix_time is real UTC Unix time and we need to convert it to local time. http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test File testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test: http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/out-of-range-timestamp-local-tz-conversion.test@3 PS6, Line 3: This test is also called with convert_legacy_hive_parquet_utc_timestamps=true. > Comment is confusing: I think the test is only called with convert_legacy_h You are right, I confused this test with another one. http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test File testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test: http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@18 PS6, Line 18: ==== : ---- QUERY > Move the new sections to a separate .test file or rename this file to somet Done http://gerrit.cloudera.org:8080/#/c/15222/6/testdata/workloads/functional-query/queries/QueryTest/utc-timestamp-functions.test@45 PS6, Line 45: ---- QUERY : SET timezone=CET; : select min(timestamp_col) from functional_avro.alltypestiny; : ---- TYPES : STRING : ---- RESULTS : '2009-01-01 00:00:00' > Since functional_avro.alltypestiny.timestamp_col is a string so probably yo I kept the test - if we add timestamp support for avro one day, it will be easy to forget about checking this. http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py File tests/custom_cluster/test_hive_parquet_timestamp_conversion.py: http://gerrit.cloudera.org:8080/#/c/15222/6/tests/custom_cluster/test_hive_parquet_timestamp_conversion.py@74 PS6, Line 74: self.check_sanity(True) : # Test with UTC too to check the optimizations added in IMPALA-9385. : for tz_name in ["PST8PDT", "UTC"]: : # The value read from the Hive table should be the same as reading a UTC converted : # value from the Impala table. : data = self.execute_query_expect_success(self.client, """ : SELECT h.id, h.day, h.timestamp_col, i.timestamp_col : FROM functional_parquet.alltypesagg_hive_13_1 h : JOIN functional_parquet.alltypesagg : i ON i.id = h.id AND i.day = h.day -- serves as a unique key : WHERE : (h.timestamp_col IS NULL AND i.timestamp_col IS NOT NULL) : OR (h.timestamp_col IS NOT NULL AND i.timestamp_col IS NULL) : OR h.timestamp_col != FROM_UTC_TIMESTAMP(i.timestamp_col, '%s') : """ % tz_name, query_options={"timezone": tz_name})\ : .get_data() : assert len(data) == 0 > Any reason you moved this block after L73? Done -- To view, visit http://gerrit.cloudera.org:8080/15222 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I14e2a7e512ccd013d5d9fe480a5467ed4c46b76e Gerrit-Change-Number: 15222 Gerrit-PatchSet: 6 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Norbert Luksa <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Comment-Date: Tue, 18 Feb 2020 20:43:07 +0000 Gerrit-HasComments: Yes
