[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/17062


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-12 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105570634
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForIntervalType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+// - MICROSEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 microsecond", 24273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 microsecond", 22273)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 microsecond", 23273)
+checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
+checkHiveHashForIntervalType("interval -999 microsecond", -975727)
+
+// - MILLISEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 millisecond", -976727)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-12 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105570430
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForIntervalType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+// - MICROSEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 microsecond", 24273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 microsecond", 22273)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 microsecond", 23273)
+checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
+checkHiveHashForIntervalType("interval -999 microsecond", -975727)
+
+// - MILLISEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 millisecond", -976727)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105570229
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForIntervalType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+// - MICROSEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 microsecond", 24273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 microsecond", 22273)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 microsecond", 23273)
+checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
+checkHiveHashForIntervalType("interval -999 microsecond", -975727)
+
+// - MILLISEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 millisecond", -976727)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-12 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105569790
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForIntervalType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+// - MICROSEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 microsecond", 24273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 microsecond", 22273)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 microsecond", 23273)
+checkHiveHashForIntervalType("interval 999 microsecond", 1022273)
+checkHiveHashForIntervalType("interval -999 microsecond", -975727)
+
+// - MILLISEC -
+
+// basic case
+checkHiveHashForIntervalType("interval 1 millisecond", 1023273)
+
+// negative
+checkHiveHashForIntervalType("interval -1 millisecond", -976727)
+
+// edge / boundary cases
+checkHiveHashForIntervalType("interval 0 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-09 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105265211
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -168,6 +170,208 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th is a leap year. Hive supports this but 
Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
--- End diff --

Hive queries for all the tests below. Outputs are generated by running 
against Hive-1.2.1

```
// - MICROSEC -
SELECT HASH(interval_day_time("0 0:0:0.01") );
SELECT HASH(interval_day_time("-0 0:0:0.01") );
SELECT HASH(interval_day_time("0 0:0:0.00") );
SELECT HASH(interval_day_time("0 0:0:0.000999") );
SELECT HASH(interval_day_time("-0 0:0:0.000999") );

// - MILLISEC -
SELECT HASH(interval_day_time("0 0:0:0.001") );
SELECT HASH(interval_day_time("-0 0:0:0.001") );
SELECT HASH(interval_day_time("0 0:0:0.000") );
SELECT HASH(interval_day_time("0 0:0:0.999") );
SELECT HASH(interval_day_time("-0 0:0:0.999") );

// - SECOND -
SELECT HASH( INTERVAL '1' SECOND);
SELECT HASH( INTERVAL '-1' SECOND);
SELECT HASH( INTERVAL '0' SECOND);
SELECT HASH( INTERVAL '2147483647' SECOND);
SELECT HASH( INTERVAL '-2147483648' SECOND);

// - MINUTE -
SELECT HASH( 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-09 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105261953
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForTimestampType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+checkHiveHashForTimestampType("interval 1 day", 3220073)
+checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
+checkHiveHashForTimestampType("interval -23 day 56 hour -113 
minute 9898989 second",
+  -2128468593)
--- End diff --

added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-09 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r105261981
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -732,6 +741,38 @@ object HiveHashFunction extends 
InterpretedHashFunction {
 HiveHasher.hashUnsafeBytes(base, offset, len)
   }
 
+  /**
+   * Mimics TimestampWritable.hashCode() in Hive
+   */
+  def hashTimestamp(timestamp: Long): Long = {
+val timestampInSeconds = timestamp / 100
+val nanoSecondsPortion = (timestamp % 100) * 1000
+
+var result = timestampInSeconds
+result <<= 30 // the nanosecond part fits in 30 bits
+result |= nanoSecondsPortion
+((result >>> 32) ^ result).toInt
+  }
+
+  /**
+   * Hive allows input intervals to be defined using units below but the 
intervals
+   * have to be from the same category:
+   * - year, month (stored as HiveIntervalYearMonth)
+   * - day, hour, minute, second, nanosecond (stored as 
HiveIntervalDayTime)
+   *
+   * eg. (INTERVAL '30' YEAR + INTERVAL '-23' DAY) fails in Hive
+   *
+   * This method mimics HiveIntervalDayTime.hashCode() in Hive. If the 
`INTERVAL` is backed as
+   * HiveIntervalYearMonth in Hive, then this method will not produce Hive 
compatible result.
+   * The reason being Spark's representation of calendar does not have 
such categories based on
+   * the interval and is unified.
+   */
+  def hashCalendarInterval(calendarInterval: CalendarInterval): Long = {
+val totalSeconds = calendarInterval.milliseconds() / 1000
--- End diff --

Spark's CalendarInterval has precision upto microseconds while Hive can 
have precision upto nanoseconds. So, there is no way for us to support that in 
the hashing function. I have documented this in the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-04 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r104282934
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala
 ---
@@ -732,6 +741,38 @@ object HiveHashFunction extends 
InterpretedHashFunction {
 HiveHasher.hashUnsafeBytes(base, offset, len)
   }
 
+  /**
+   * Mimics TimestampWritable.hashCode() in Hive
+   */
+  def hashTimestamp(timestamp: Long): Long = {
+val timestampInSeconds = timestamp / 100
+val nanoSecondsPortion = (timestamp % 100) * 1000
+
+var result = timestampInSeconds
+result <<= 30 // the nanosecond part fits in 30 bits
+result |= nanoSecondsPortion
+((result >>> 32) ^ result).toInt
+  }
+
+  /**
+   * Hive allows input intervals to be defined using units below but the 
intervals
+   * have to be from the same category:
+   * - year, month (stored as HiveIntervalYearMonth)
+   * - day, hour, minute, second, nanosecond (stored as 
HiveIntervalDayTime)
+   *
+   * eg. (INTERVAL '30' YEAR + INTERVAL '-23' DAY) fails in Hive
+   *
+   * This method mimics HiveIntervalDayTime.hashCode() in Hive. If the 
`INTERVAL` is backed as
+   * HiveIntervalYearMonth in Hive, then this method will not produce Hive 
compatible result.
+   * The reason being Spark's representation of calendar does not have 
such categories based on
+   * the interval and is unified.
+   */
+  def hashCalendarInterval(calendarInterval: CalendarInterval): Long = {
+val totalSeconds = calendarInterval.milliseconds() / 1000
--- End diff --

How does Hive deal with nanoseconds, if we divide it by 1000? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-03-03 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r104282564
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForTimestampType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+checkHiveHashForTimestampType("interval 1 day", 3220073)
+checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
+checkHiveHashForTimestampType("interval -23 day 56 hour -113 
minute 9898989 second",
+  -2128468593)
--- End diff --

Coud you add more test cases?

```
checkHiveHashForTimestampType("interval 0 day 0 hour 0 minute 0 
second", 23273)
checkHiveHashForTimestampType("interval 0 day 0 hour", 23273)
checkHiveHashForTimestampType("interval -1 day", 3220036)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103357272
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForTimestampType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+checkHiveHashForTimestampType("interval 1 day", 3220073)
--- End diff --

SELECT HASH ( INTERVAL '1' DAY );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103357588
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForTimestampType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+checkHiveHashForTimestampType("interval 1 day", 3220073)
+checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
+checkHiveHashForTimestampType("interval -23 day 56 hour -113 
minute 9898989 second",
--- End diff --

 SELECT HASH ( INTERVAL '-23' DAY + INTERVAL '56' HOUR + INTERVAL 
'-113' MINUTE + INTERVAL '9898989' SECOND );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: 

[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103300592
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
--- End diff --

same as `Date`, invalid timestamp values are not allowed in Spark and it 
will fail. Hive will not fail but fallback to `null` and return `0` as hash 
value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103281696
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
--- End diff --

expected values computed over hive 1.2. using:

```
SELECT HASH( CAST( "2017-01-01" AS DATE) )
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103300013
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
--- End diff --

Spark does not allow creating `Date` which do not fit its spec and throws 
exception. Hive will not fail but fallback to `null` and return `0` as hash 
value.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103357472
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
+
+// with higher precision
+checkHiveHashForTimestampType("2017-02-24 10:56:29.11", 1353936655)
+
+// with different timezone
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445732471,
+  TimeZone.getTimeZone("US/Pacific"))
+
+// boundary cases
+checkHiveHashForTimestampType("0001-01-01 00:00:00", 1645926784)
+checkHiveHashForTimestampType("-01-01 00:00:00", -1081818240)
+
+// epoch
+checkHiveHashForTimestampType("1970-01-01 00:00:00", 0)
+
+// before epoch
+checkHiveHashForTimestampType("1800-01-01 03:12:45", -267420885)
+
+// Invalid input: bad timestamp string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForTimestampType("0-0-0 
0:0:0", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("-99-99-99 
99:99:45", 0))
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("55-5-",
 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForTimestampType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForTimestampType("2016-02-30 
00:00:00", 0))
+
+// Invalid input: Hive accepts upto 9 decimal place precision but 
Spark uses upto 6
+
intercept[TestFailedException](checkHiveHashForTimestampType("2017-02-24 
10:56:29.", 0))
+  }
+
+  test("hive-hash for CalendarInterval type") {
+def checkHiveHashForTimestampType(interval: String, expected: Long): 
Unit = {
+  checkHiveHash(CalendarInterval.fromString(interval), 
CalendarIntervalType, expected)
+}
+
+checkHiveHashForTimestampType("interval 1 day", 3220073)
+checkHiveHashForTimestampType("interval 6 day 15 hour", 21202073)
--- End diff --

SELECT HASH ( INTERVAL '1' DAY + INTERVAL '15' HOUR );


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-27 Thread tejasapatil
Github user tejasapatil commented on a diff in the pull request:

https://github.com/apache/spark/pull/17062#discussion_r103300293
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala
 ---
@@ -169,6 +171,96 @@ class HashExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 // scalastyle:on nonascii
   }
 
+  test("hive-hash for date type") {
+def checkHiveHashForDateType(dateString: String, expected: Long): Unit 
= {
+  checkHiveHash(
+DateTimeUtils.stringToDate(UTF8String.fromString(dateString)).get,
+DateType,
+expected)
+}
+
+// basic case
+checkHiveHashForDateType("2017-01-01", 17167)
+
+// boundary cases
+checkHiveHashForDateType("-01-01", -719530)
+checkHiveHashForDateType("-12-31", 2932896)
+
+// epoch
+checkHiveHashForDateType("1970-01-01", 0)
+
+// before epoch
+checkHiveHashForDateType("1800-01-01", -62091)
+
+// Invalid input: bad date string. Hive returns 0 for such cases
+intercept[NoSuchElementException](checkHiveHashForDateType("0-0-0", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("-1212-01-01", 0))
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-99-99", 0))
+
+// Invalid input: Empty string. Hive returns 0 for this case
+intercept[NoSuchElementException](checkHiveHashForDateType("", 0))
+
+// Invalid input: February 30th for a leap year. Hive supports this 
but Spark doesn't
+
intercept[NoSuchElementException](checkHiveHashForDateType("2016-02-30", 16861))
+  }
+
+  test("hive-hash for timestamp type") {
+def checkHiveHashForTimestampType(
+timestamp: String,
+expected: Long,
+timeZone: TimeZone = TimeZone.getTimeZone("UTC")): Unit = {
+  checkHiveHash(
+DateTimeUtils.stringToTimestamp(UTF8String.fromString(timestamp), 
timeZone).get,
+TimestampType,
+expected)
+}
+
+// basic case
+checkHiveHashForTimestampType("2017-02-24 10:56:29", 1445725271)
--- End diff --

Corresponding hive query.
```
select HASH(CAST("2017-02-24 10:56:29" AS TIMESTAMP));
```

Note that this is with system's timezone set to UTC (export 
TZ=/usr/share/zoneinfo/UTC)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #17062: [SPARK-17495] [SQL] Support date, timestamp and i...

2017-02-24 Thread tejasapatil
GitHub user tejasapatil opened a pull request:

https://github.com/apache/spark/pull/17062

[SPARK-17495] [SQL] Support date, timestamp and interval types in Hive hash

## What changes were proposed in this pull request?

- Timestamp hashing is done as per 
[TimestampWritable.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/serde/src/java/org/apache/hadoop/hive/serde2/io/TimestampWritable.java#L406)
 in Hive
- Interval hashing is done as per 
[HiveIntervalDayTime.hashCode()](https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/storage-api/src/java/org/apache/hadoop/hive/common/type/HiveIntervalDayTime.java#L178).
 Note that there are inherent differences in how Hive and Spark store intervals 
under the hood which limits the ability to be in completely sync with hive's 
hashing function. I have explained this in the method doc.
- Date type was already supported. This PR adds test for that.

## How was this patch tested?

Added unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tejasapatil/spark 
SPARK-17495_time_related_types

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/17062.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17062


commit cc359fc45547b7ba3fd4c1d11d3dcfbaf71ea66a
Author: Tejas Patil 
Date:   2017-02-25T00:18:03Z

[SPARK-17495] [SQL] Support date, timestamp datatypes in Hive hash

commit 332475c1641f61080aa41dda9f1ceec237351d75
Author: Tejas Patil 
Date:   2017-02-25T02:23:41Z

minor refac




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org