[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

tejasapatil Sun, 12 Mar 2017 15:05:47 -0700

Github user tejasapatil commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17062#discussion_r105570430
  
    --- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
    @@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
         // scalastyle:on nonascii
       }
     
    +  test("hive-hash for date type") {
    +    def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
    +      checkHiveHash(
    +        DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
    +        DateType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForDateType("2017-01-01", 17167)
    +
    +    // boundary cases
    +    checkHiveHashForDateType("0000-01-01", -719530)
    +    checkHiveHashForDateType("9999-12-31", 2932896)
    +
    +    // epoch
    +    checkHiveHashForDateType("1970-01-01", 0)
    +
    +    // before epoch
    +    checkHiveHashForDateType("1800-01-01", -62091)
    +
    +    // Invalid input: bad date string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
    +    
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
    +    
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
    +
    +    // Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
    +    
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
    +  }
    +
    +  test("hive-hash for timestamp type") {
    +    def checkHiveHashForTimestampType(
    +        timestamp: String,
    +        expected: Long,
    +        timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
    +      checkHiveHash(
    +        DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
    +        TimestampType,
    +        expected)
    +    }
    +
    +    // basic case
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
    +
    +    // with higher precision
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29.111111", 1353936655)
    +
    +    // with different timezone
    +    checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
    +      TimeZone.getTimeZone("US/Pacific"))
    +
    +    // boundary cases
    +    checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
    +    checkHiveHashForTimestampType("9999-01-01 00:00:00", -1081818240)
    +
    +    // epoch
    +    checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
    +
    +    // before epoch
    +    checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
    +
    +    // Invalid input: bad timestamp string. Hive returns 0 for such cases
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
    +    
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
    +    
intercept[NoSuchElementException](checkHiveHashForTimestampType("555555-55555-5555",
 0))
    +
    +    // Invalid input: Empty string. Hive returns 0 for this case
    +    intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
    +
    +    // Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
    +    
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
    +
    +    // Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
    +    
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.11111111", 0))
    +  }
    +
    +  test("hive-hash for CalendarInterval type") {
    +    def checkHiveHashForIntervalType(interval: String, expected: Long): 
Unit = {
    +      checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
    +    }
    +
    +    // ----- MICROSEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 microsecond", 24273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 microsecond", 22273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 microsecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
    +    checkHiveHashForIntervalType("interval -999 microsecond", -975727)
    +
    +    // ----- MILLISEC -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 millisecond", -976727)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 millisecond", 23273)
    +    checkHiveHashForIntervalType("interval 999 millisecond", 999023273)
    +    checkHiveHashForIntervalType("interval -999 millisecond", -998976727)
    +
    +    // ----- SECOND -----
    +
    +    // basic case
    +    checkHiveHashForIntervalType("interval 1 second", 23310)
    +
    +    // negative
    +    checkHiveHashForIntervalType("interval -1 second", 23273)
    +
    +    // edge / boundary cases
    +    checkHiveHashForIntervalType("interval 0 second", 23273)
    +    checkHiveHashForIntervalType("interval 2147483647 second", -2147460412)
    +    checkHiveHashForIntervalType("interval -2147483648 second", 
-2147460412)
    +
    +    // Out of range for both Hive and Spark
    +    // Hive throws an exception. Spark overflows and returns wrong output
    +    // checkHiveHashForIntervalType("interval 9999999999 day", -4767228)
    --- End diff --
    
    In case of Spark SQL, the query with fails with exception (see below). 
However for the test case since I am by-passing and creating raw interval 
object which does not go through that check
    
    ```
    scala> hc.sql("SELECT interval 9999999999 day ").show
    org.apache.spark.sql.catalyst.parser.ParseException:
    Error parsing interval string: day 9999999999 outside range [-106751991, 
106751991](line 1, pos 16)
    
    == SQL ==
    SELECT interval 9999999999 day
    ```
    
    ```
    scala> df.select("INTERVAL 9999999999 day").show()
    org.apache.spark.sql.AnalysisException: cannot resolve '`INTERVAL 
9999999999 day`' given input columns: [key, value];;
    'Project ['INTERVAL 9999999999 day]
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

Reply via email to