[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31979: [SPARK-34879][SQL][test-hive1.2] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

GitBox Sun, 28 Mar 2021 00:14:42 -0700


AngersZhuuuu commented on a change in pull request #31979:
URL: https://github.com/apache/spark/pull/31979#discussion_r602843043




##########
File path: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveInspectors.scala
##########
@@ -647,6 +668,40 @@ private[hive] trait HiveInspectors {
               null
             }
           }
+        case dt: HiveIntervalDayTimeObjectInspector if dt.preferWritable() =>
+          data: Any => {
+            if (data != null) {
+              val dayTime = 
dt.getPrimitiveWritableObject(data).getHiveIntervalDayTime
+              dayTime.getTotalSeconds * DateTimeConstants.NANOS_PER_SECOND + 
dayTime.getNanos

Review comment:
       > Could you explain, please, when the nanoseconds are converted to 
microseconds because values of `DayTimeIntervalType` must contain microseconds.
   > 
   > And also by converting to nanos and put it to `Long`, you get the risk of 
overflow. Could you write a test in which a day-time interval has 
`Long.MaxValue` microseconds.
   
   Sorry for my mistake, I use wrong unit, spark store `DayTimeIntervalType` 
with Long value as microSecond
   Duration store time interval of seconds + nanos
   HiveIntervalDayTime store seconds + nanos.
   
   
   If we have a overflow when create DayTimeIntervalType, we will get error in 
spark side. Such as 
   ```
   [info]   java.lang.RuntimeException: Error while encoding: 
java.lang.ArithmeticException: long overflow
   [info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$, 
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0, 
scala.Tuple4, true]))._1, true, false) AS _1#0
   [info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$, 
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0, 
scala.Tuple4, true]))._2, true, false) AS _2#1
   [info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$, 
DayTimeIntervalType, durationToMicros, knownnotnull(assertnotnull(input[0, 
scala.Tuple4, true]))._3, true, false) AS _3#2
   [info] staticinvoke(class org.apache.spark.sql.catalyst.util.IntervalUtils$, 
YearMonthIntervalType, periodToMonths, knownnotnull(assertnotnull(input[0, 
scala.Tuple4, true]))._4, true, false) AS _4#3
   [info]
   ```
   
   But duration the computation:
   1. convert spark data to HiveIntervalDayTime, we use  method as below, so it 
should be safe since second won't overflow, nano less than 1000000.
   ```
   withNullSafe(o => {
             val duration = IntervalUtils.microsToDuration(o.asInstanceOf[Long])
             new HiveIntervalDayTime(duration.getSeconds, duration.getNano)
           })
   ```
   2. Convert HiveIntervalDayTime to spark DayTimeIntervalType's microseconds,, 
we use 
   ```
    val dayTime = dt.getPrimitiveWritableObject(data).getHiveIntervalDayTime
    IntervalUtils.durationToMicros(
                   
Duration.ofSeconds(dayTime.getTotalSeconds).plusNanos(dayTime.getNanos.toLong))
   ``` 
   Since convert HiveIntervalDayTime won't cause overflow,  so it only cause 
overflow when  call `IntervalUtils.durationToMicros`.  But it will throw 
exception. I think I need to add a Unit Test about this case




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] AngersZhuuuu commented on a change in pull request #31979: [SPARK-34879][SQL][test-hive1.2] HiveInspector support DayTimeIntervalType and YearMonthIntervalType

Reply via email to