Riza Suminto has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22293 )

Change subject: IMPALA-13627: Handle legacy Hive timezone conversion
......................................................................


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/22293/11/be/src/benchmarks/convert-timestamp-benchmark.cc
File be/src/benchmarks/convert-timestamp-benchmark.cc:

http://gerrit.cloudera.org:8080/#/c/22293/11/be/src/benchmarks/convert-timestamp-benchmark.cc@115
PS11, Line 115: FromUnixTimeNanos:         Function  iters/ms   10%ile   50%ile 
  90%ile     10%ile     50%ile     90%ile
              :                                                                 
         (relative) (relative) (relative)
              : 
---------------------------------------------------------------------------------------------------------
              :                   (sec split (old))               31.2       32 
    32.4         1X         1X         1X
              :                   (sec split (new))               16.3     16.5 
    16.6     0.523X     0.517X     0.513X
              :
              : FromSubsecondUnixTime:     Function  iters/ms   10%ile   50%ile 
  90%ile     10%ile     50%ile     90%ile
              :                                                                 
         (relative) (relative) (relative)
              : 
---------------------------------------------------------------------------------------------------------
              :                               (old)                 32     32.5 
      33         1X         1X         1X
              :                               (new)               14.7     14.9 
    15.1      0.46X     0.458X     0.458X
Why the number looks regress here? Relative of (new) < Relative of (old). Did 
they switch place?


http://gerrit.cloudera.org:8080/#/c/22293/11/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/22293/11/common/thrift/ImpalaService.thrift@981
PS11, Line 981: convert_legacy_hive_parquet_utc_timestamps
nit: Please uppercase to highlight that this is an Impala query option.


http://gerrit.cloudera.org:8080/#/c/22293/11/docs/topics/impala_timestamp.xml
File docs/topics/impala_timestamp.xml:

http://gerrit.cloudera.org:8080/#/c/22293/11/docs/topics/impala_timestamp.xml@292
PS11, Line 292: hive.parquet.timestamp.legacy.conversion.enable
hive.parquet.timestamp.legacy.conversion.enabled


http://gerrit.cloudera.org:8080/#/c/22293/11/tests/query_test/test_hive_timestamp_conversion.py
File tests/query_test/test_hive_timestamp_conversion.py:

http://gerrit.cloudera.org:8080/#/c/22293/11/tests/query_test/test_hive_timestamp_conversion.py@18
PS11, Line 18: # Tests for Hive-IMPALA compression codec interoperability
nit: This is better as code comment for TestHiveParquetTimestampConversion 
class.

Please also mention that convert_legacy_hive_parquet_utc_timestamps, 
use_legacy_hive_timestamp_conversion, and timezone options are important to 
exercise, and they are tested at each .test file in this test.


http://gerrit.cloudera.org:8080/#/c/22293/11/tests/query_test/test_hive_timestamp_conversion.py@32
PS11, Line 32:   def add_test_dimensions(cls):
             :     super(TestHiveParquetTimestampConversion, 
cls).add_test_dimensions()
Individual .test file is short. Therefore, tests in this class seems better to 
evaluate by querying directly and then validate the rows in the py.test file 
rather than using run_test_case.

The possible results are only 2 set, depending on whether both 
convert_legacy_hive_parquet_utc_timestamps and 
use_legacy_hive_timestamp_conversion is all True or not. You can exercise 4 
permutations by adding:

  add_exec_option_dimension(cls, 'convert_legacy_hive_parquet_utc_timestamps', 
[False, True])
  add_exec_option_dimension(cls, 'use_legacy_hive_timestamp_conversion', 
[False, True])


http://gerrit.cloudera.org:8080/#/c/22293/11/tests/query_test/test_hive_timestamp_conversion.py@55
PS11, Line 55: writer.time.zone
What is the value of this metadata in employee_hive_3_1_3_us_pacific.parquet?


http://gerrit.cloudera.org:8080/#/c/22293/11/tests/query_test/test_hive_timestamp_conversion.py@67
PS11, Line 67:     """Test table containing Hive legacy timestamps written with 
Hive prior to 3.1.3.
             :
             :     Test files from
             :     
https://github.com/apache/hive/tree/rel/release-4.0.1/data/files/tbl_parq1.""";
Can you mention something about timezone="Asia/Singapore" inside 
timestamp-conversion-hive-3m.test ?



--
To view, visit http://gerrit.cloudera.org:8080/22293
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1271ed1da0b74366ab8315e7ec2d4ee47111e067
Gerrit-Change-Number: 22293
Gerrit-PatchSet: 11
Gerrit-Owner: Michael Smith <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Michael Smith <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Comment-Date: Thu, 06 Feb 2025 19:52:08 +0000
Gerrit-HasComments: Yes

Reply via email to