chong created SPARK-38577: ----------------------------- Summary: Interval types are not truncated to the expected endField when creating a DataFrame via Duration Key: SPARK-38577 URL: https://issues.apache.org/jira/browse/SPARK-38577 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Environment: Spark 3.3.0 snapshot version
Reporter: chong *Problem:* ANSI interval types are store as long internally. The long value are not truncated to the expected endField when creating a DataFrame via Duration. *Reproduce:* Create a "day to day" interval, the seconds are not truncated, see below code. The internal long is not {*}86400 * 1000000{*}, but it's ({*}86400 + 1) * 1000000{*}{*}{*} {code:java} test("my test") { val data = Seq(Row(Duration.ofDays(1).plusSeconds(1))) val schema = StructType(Array( StructField("t", DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY)) )) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.show() } {code} After debug, the {{endField}} is always {{SECOND}} in {{{}durationToMicros{}}}, see below: {code:java} // IntervalUtils class def durationToMicros(duration: Duration): Long = { durationToMicros(duration, DT.SECOND) // always SECOND } def durationToMicros(duration: Duration, endField: Byte) {code} Seems should use different endField which could be [DAY, HOUR, MINUTE, SECOND] -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org