MaxGekk commented on a change in pull request #26102: [SPARK-29448][SQL]
Support the `INTERVAL` type by Parquet datasource
URL: https://github.com/apache/spark/pull/26102#discussion_r335287678
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
##########
@@ -325,6 +325,26 @@ private[parquet] class ParquetRowConverter(
override def set(value: Any): Unit =
updater.set(value.asInstanceOf[InternalRow].copy())
})
+ case CalendarIntervalType
+ if parquetType.asPrimitiveType().getPrimitiveTypeName ==
FIXED_LEN_BYTE_ARRAY =>
+ new ParquetPrimitiveConverter(updater) {
+ override def addBinary(value: Binary): Unit = {
+ assert(
+ value.length() == 12,
+ "Intervals are expected to be stored in 12-byte fixed len byte
array, " +
+ s"but got a ${value.length()}-byte array.")
+
+ val buf = value.toByteBuffer.order(ByteOrder.LITTLE_ENDIAN)
+ val milliseconds = buf.getInt
+ var microseconds = milliseconds * DateTimeUtils.MICROS_PER_MILLIS
+ val days = buf.getInt
+ val daysInUs = Math.multiplyExact(days,
DateTimeUtils.MICROS_PER_DAY)
Review comment:
Don't want to defend another side :-) but the consequence of storing days
separately means that hours are unbounded. In this way, `interval 1 day 25
hours` and `interval 2 days 1 hours` are represented differently in parquet -
(0, 1, 90000000) and (0, 2, 3600000). As @cloud-fan wrote above, this can lead
to different result while adding those intervals to 2 November 2019:
`2019-11-02` + `interval 1 day 25 hours` = `2019-11-04 00:00:00` but
`2019-11-02` + `interval 2 days 1 hour` = `2019-11-04 01:00:00`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]