AngersZhuuuu commented on a change in pull request #31979:
URL: https://github.com/apache/spark/pull/31979#discussion_r602843043
##########
File path:
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
##########
@@ -647,6 +668,40 @@ private[hive] trait HiveInspectors {
null
}
}
+ case dt: HiveIntervalDayTimeObjectInspector if dt.preferWritable() =>
+ data: Any => {
+ if (data != null) {
+ val dayTime =
dt.getPrimitiveWritableObject(data).getHiveIntervalDayTime
+ dayTime.getTotalSeconds * DateTimeConstants.NANOS_PER_SECOND +
dayTime.getNanos
Review comment:
> Could you explain, please, when the nanoseconds are converted to
microseconds because values of `DayTimeIntervalType` must contain microseconds.
>
> And also by converting to nanos and put it to `Long`, you get the risk of
overflow. Could you write a test in which a day-time interval has
`Long.MaxValue` microseconds.
Sorry for my mistake, I use wrong unit, spark store `DayTimeIntervalType`
with Long value as microSecond
Duration store time interval of seconds + nanos
HiveIntervalDayTime store seconds + nanos.
If we have a overflow when create DayTimeIntervalType, we will get error in
spark side. Such as
```
[info] java.lang.RuntimeException: Error while encoding:
java.lang.ArithmeticException: long overflow
[info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$,
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0,
scala.Tuple4, true]))._1, true, false) AS _1#0
[info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$,
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0,
scala.Tuple4, true]))._2, true, false) AS _2#1
[info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$,
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0,
scala.Tuple4, true]))._3, true, false) AS _3#2
[info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$,
YearMonthIntervalType, periodToMonths, knownnotnull(assertnotnull(input[0,
scala.Tuple4, true]))._4, true, false) AS _4#3
[info]
```
But duration the computation:
1. convert spark data to HiveIntervalDayTime, we use method as below, so it
should be safe since second won't overflow, nano less than 1000000.
```
withNullSafe(o => {
val duration = IntervalUtils.microsToDuration(o.asInstanceOf[Long])
new HiveIntervalDayTime(duration.getSeconds, duration.getNano)
})
```
2. Convert HiveIntervalDayTime to spark DayTimeIntervalType's microseconds,,
we use
```
val dayTime = dt.getPrimitiveWritableObject(data).getHiveIntervalDayTime
IntervalUtils.durationToMicros(
Duration.ofSeconds(dayTime.getTotalSeconds).plusNanos(dayTime.getNanos.toLong))
```
Since convert HiveIntervalDayTime won't cause overflow, so it only cause
overflow when call `IntervalUtils.durationToMicros`. But it will throw
exception. I think I need to add a Unit Test about this case
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]