fifteencai has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16869


Change subject: IMPALA-9922: A better approach to deal with date's sub-second 
fractions
......................................................................

IMPALA-9922: A better approach to deal with date's sub-second fractions

In this patch, we relax the constraint on date format of to_timestamp().

Previously, the sub-second fraction width should never exceed the data,
otherwise, the conversion will fail. This constraint is stricter than
many query engines like Hive, Presto, ClickHouse etc. Hence we've
encountered inconsistent conversions on dataset with malformed values.

To relax this constraint, we made these modifications:
1) Introduced a new parameter `desired_length` alongside with `tok_len`,
the former parameter refers to the fraction width we want.
2) When fraction part is a single `S`, the `desired_length` is set to
actual width of data, making the output consistent with older logic.
3) In `DatetimeSimpleDateFormatParser`, move a pointer computation out
of the loop, avoiding unnecessary code execution.

========================================================================
Example:
> Before:
select to_timestamp("2020-01-01 18:00:00.12","yyyy-MM-dd HH:mm:ss.SSS")
------------------------------------------------------------------------
> After:
select to_timestamp("2020-01-01 18:00:00.12","yyyy-MM-dd HH:mm:ss.SSS")
========================================================================

Testing:
1. Added 3 test cases in `timestamp-test.cc`.
2. Passed all backend tests

Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
---
M be/src/runtime/datetime-iso-sql-format-parser.cc
M be/src/runtime/datetime-parser-common.cc
M be/src/runtime/datetime-parser-common.h
M be/src/runtime/datetime-simple-date-format-parser.cc
M be/src/runtime/timestamp-test.cc
5 files changed, 36 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/16869/1
--
To view, visit http://gerrit.cloudera.org:8080/16869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e870bb8ad8fd14d388f37dfc5175589ecf9a5a7
Gerrit-Change-Number: 16869
Gerrit-PatchSet: 1
Gerrit-Owner: fifteencai <fifteen...@tencent.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>

Reply via email to