[ https://issues.apache.org/jira/browse/SPARK-38577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chong updated SPARK-38577: -------------------------- Description: *Problem:* ANSI interval types are store as long internally. The long value are not truncated to the expected endField when creating a DataFrame via Duration. *Reproduce:* Create a "day to day" interval, the seconds are not truncated, see below code. The internal long is not {*}86400 * 1000000{*}, but it's ({*}86400 + 1) * 1000000{*}{*}{{*}} {code:java} test("my test") { val data = Seq(Row(Duration.ofDays(1).plusSeconds(1))) val schema = StructType(Array( StructField("t", DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY)) )) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.show() } {code} After debug, the {{endField}} is always {{SECOND}} in {{{}durationToMicros{}}}, see below: {code:java} // IntervalUtils class def durationToMicros(duration: Duration): Long = { durationToMicros(duration, DT.SECOND) // always SECOND } def durationToMicros(duration: Duration, endField: Byte) {code} Seems should use different endField which could be [DAY, HOUR, MINUTE, SECOND] Or Spark can throw an exception to avoid truncating. was: *Problem:* ANSI interval types are store as long internally. The long value are not truncated to the expected endField when creating a DataFrame via Duration. *Reproduce:* Create a "day to day" interval, the seconds are not truncated, see below code. The internal long is not {*}86400 * 1000000{*}, but it's ({*}86400 + 1) * 1000000{*}{*}{*} {code:java} test("my test") { val data = Seq(Row(Duration.ofDays(1).plusSeconds(1))) val schema = StructType(Array( StructField("t", DayTimeIntervalType(DayTimeIntervalType.DAY, DayTimeIntervalType.DAY)) )) val df = spark.createDataFrame(spark.sparkContext.parallelize(data), schema) df.show() } {code} After debug, the {{endField}} is always {{SECOND}} in {{{}durationToMicros{}}}, see below: {code:java} // IntervalUtils class def durationToMicros(duration: Duration): Long = { durationToMicros(duration, DT.SECOND) // always SECOND } def durationToMicros(duration: Duration, endField: Byte) {code} Seems should use different endField which could be [DAY, HOUR, MINUTE, SECOND] > Interval types are not truncated to the expected endField when creating a > DataFrame via Duration > ------------------------------------------------------------------------------------------------ > > Key: SPARK-38577 > URL: https://issues.apache.org/jira/browse/SPARK-38577 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.3.0 > Environment: Spark 3.3.0 snapshot version > > Reporter: chong > Priority: Major > > *Problem:* > ANSI interval types are store as long internally. > The long value are not truncated to the expected endField when creating a > DataFrame via Duration. > > *Reproduce:* > Create a "day to day" interval, the seconds are not truncated, see below code. > The internal long is not {*}86400 * 1000000{*}, but it's ({*}86400 + 1) * > 1000000{*}{*}{{*}} > > {code:java} > test("my test") { > val data = Seq(Row(Duration.ofDays(1).plusSeconds(1))) > val schema = StructType(Array( > StructField("t", DayTimeIntervalType(DayTimeIntervalType.DAY, > DayTimeIntervalType.DAY)) > )) > val df = spark.createDataFrame(spark.sparkContext.parallelize(data), > schema) > df.show() > } {code} > > > After debug, the {{endField}} is always {{SECOND}} in > {{{}durationToMicros{}}}, see below: > > {code:java} > // IntervalUtils class > def durationToMicros(duration: Duration): Long = { > durationToMicros(duration, DT.SECOND) // always SECOND > } > def durationToMicros(duration: Duration, endField: Byte) > {code} > Seems should use different endField which could be [DAY, HOUR, MINUTE, SECOND] > Or Spark can throw an exception to avoid truncating. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org