cloud-fan commented on a change in pull request #26507: [SPARK-29904][SQL][2.4]
Parse timestamps in microsecond precision by JSON/CSV datasources
URL: https://github.com/apache/spark/pull/26507#discussion_r346660847
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1164,4 +1167,48 @@ object DateTimeUtils {
threadLocalTimestampFormat.remove()
threadLocalDateFormat.remove()
}
+
+ /**
+ * The custom sub-class of `GregorianCalendar` is needed to get access to
+ * the array of parsed `fields` immediately after parsing. We cannot use
+ * the `get()` method because it performs normalization of the fraction
+ * part. Accordingly, the `MILLISECOND` field doesn't contain original value.
+ */
+ class MicrosCalendar(tz: TimeZone) extends GregorianCalendar(tz, Locale.US) {
+ // Converts parsed `MILLISECOND` field to seconds fraction in microsecond
precision.
+ // For example if the fraction pattern is `SSSS` then `digitsInFraction` =
4, and
+ // if the `MILLISECOND` field was parsed to `1234`.
+ def getMicros(digitsInFraction: Int): SQLTimestamp = {
+ // Append `digitsInFraction` zeros to the field: 1234 -> 1234000000
+ val d = fields(Calendar.MILLISECOND) * MICROS_PER_SECOND
+ // Take the first 6 digits from `d`: 1234000000 -> 123400
+ // The rest contains exactly `digitsInFraction`: `0000` = 10 ^
digitsInFraction
+ // So, the result is `(1234 * 1000000) / (10 ^ digitsInFraction)
+ d / Decimal.POW_10(digitsInFraction)
+ }
+ }
+
+ /**
+ * An instance of the class is aimed to re-use many times. It contains
helper objects
+ * that can be reused between `parse()` invokes.
+ * @param format The parser itself.
+ * @param digitsInFraction The number of digits in the seconds fraction
precalculated
+ * from the pattern. For `ss.SSSS`, it is 4.
+ * @param cal The calendar which can get microseconds from the second
fraction.
+ */
+ class DateTimeParser(format: FastDateFormat, digitsInFraction: Int, cal:
MicrosCalendar) {
Review comment:
can we construct the `digitsInFraction` and `cal` in the class body? It's
weird that the core logic of this class is done outside of this class.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]