[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r338666099 ## File path: sql/core/benchmarks/IntervalBenchmark-results.txt ## @@ -1,25 +1,25 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz cast strings to intervals:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -string w/ interval 420435 18 2.4 419.8 1.0X -string w/o interval 359365 10 2.8 358.7 1.2X -1 units w/ interval 752759 8 1.3 752.0 0.6X -1 units w/o interval762766 4 1.3 762.0 0.6X -2 units w/ interval 961970 8 1.0 960.7 0.4X -2 units w/o interval970976 9 1.0 970.2 0.4X -3 units w/ interval1130 1136 7 0.91130.4 0.4X -3 units w/o interval 1150 1158 9 0.91150.3 0.4X -4 units w/ interval1333 1336 3 0.71333.5 0.3X -4 units w/o interval 1354 1359 4 0.71354.5 0.3X -5 units w/ interval1523 1525 2 0.71523.3 0.3X -5 units w/o interval 1549 1551 3 0.61549.4 0.3X -6 units w/ interval1661 1663 2 0.61660.8 0.3X -6 units w/o interval 1691 1704 13 0.61691.2 0.2X -7 units w/ interval1811 1817 8 0.61810.6 0.2X -7 units w/o interval 1853 1854 1 0.51853.2 0.2X -8 units w/ interval2029 2037 8 0.52028.7 0.2X -8 units w/o interval 2075 2075 1 0.52074.5 0.2X -9 units w/ interval2170 2175 5 0.52170.0 0.2X -9 units w/o interval 2204 2212 8 0.52203.6 0.2X +prepare string w/ interval 403419 18 2.5 403.1 1.0X +prepare string w/o interval 341353 21 2.9 341.1 1.2X +1 units w/ interval5154 5159 8 0.25153.5 0.1X +1 units w/o interval 4818 4833 20 0.24817.6 0.1X +2 units w/ interval6191 6223 41 0.26190.6 0.1X +2 units w/o interval 6236 6264 25 0.26235.7 0.1X +3 units w/ interval7397 7567 170 0.17397.0 0.1X +3 units w/o interval 7280 7367 76 0.17279.6 0.1X +4 units w/ interval8197 8228 27 0.18197.3 0.0X +4 units w/o interval 7977 7989 17 0.17977.3 0.1X +5 units w/ interval9089 9192 101 0.19088.8 0.0X +5 units w/o interval 8853 8858 5 0.18852.8 0.0X +6 units w/ interval9696 9720 23 0.19695.6 0.0X +6 units w/o interval 9509 9518 9 0.19509.4 0.0X +7 units w/ interval 10738
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r338209548 ## File path: sql/core/benchmarks/IntervalBenchmark-results.txt ## @@ -1,25 +1,25 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz cast strings to intervals:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -string w/ interval 420435 18 2.4 419.8 1.0X -string w/o interval 359365 10 2.8 358.7 1.2X -1 units w/ interval 752759 8 1.3 752.0 0.6X -1 units w/o interval762766 4 1.3 762.0 0.6X -2 units w/ interval 961970 8 1.0 960.7 0.4X -2 units w/o interval970976 9 1.0 970.2 0.4X -3 units w/ interval1130 1136 7 0.91130.4 0.4X -3 units w/o interval 1150 1158 9 0.91150.3 0.4X -4 units w/ interval1333 1336 3 0.71333.5 0.3X -4 units w/o interval 1354 1359 4 0.71354.5 0.3X -5 units w/ interval1523 1525 2 0.71523.3 0.3X -5 units w/o interval 1549 1551 3 0.61549.4 0.3X -6 units w/ interval1661 1663 2 0.61660.8 0.3X -6 units w/o interval 1691 1704 13 0.61691.2 0.2X -7 units w/ interval1811 1817 8 0.61810.6 0.2X -7 units w/o interval 1853 1854 1 0.51853.2 0.2X -8 units w/ interval2029 2037 8 0.52028.7 0.2X -8 units w/o interval 2075 2075 1 0.52074.5 0.2X -9 units w/ interval2170 2175 5 0.52170.0 0.2X -9 units w/o interval 2204 2212 8 0.52203.6 0.2X +prepare string w/ interval 403419 18 2.5 403.1 1.0X +prepare string w/o interval 341353 21 2.9 341.1 1.2X +1 units w/ interval5154 5159 8 0.25153.5 0.1X +1 units w/o interval 4818 4833 20 0.24817.6 0.1X +2 units w/ interval6191 6223 41 0.26190.6 0.1X +2 units w/o interval 6236 6264 25 0.26235.7 0.1X +3 units w/ interval7397 7567 170 0.17397.0 0.1X +3 units w/o interval 7280 7367 76 0.17279.6 0.1X +4 units w/ interval8197 8228 27 0.18197.3 0.0X +4 units w/o interval 7977 7989 17 0.17977.3 0.1X +5 units w/ interval9089 9192 101 0.19088.8 0.0X +5 units w/o interval 8853 8858 5 0.18852.8 0.0X +6 units w/ interval9696 9720 23 0.19695.6 0.0X +6 units w/o interval 9509 9518 9 0.19509.4 0.0X +7 units w/ interval 10738
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r338206250 ## File path: sql/core/benchmarks/IntervalBenchmark-results.txt ## @@ -1,25 +1,25 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz +Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14 +Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz cast strings to intervals:Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative -string w/ interval 420435 18 2.4 419.8 1.0X -string w/o interval 359365 10 2.8 358.7 1.2X -1 units w/ interval 752759 8 1.3 752.0 0.6X -1 units w/o interval762766 4 1.3 762.0 0.6X -2 units w/ interval 961970 8 1.0 960.7 0.4X -2 units w/o interval970976 9 1.0 970.2 0.4X -3 units w/ interval1130 1136 7 0.91130.4 0.4X -3 units w/o interval 1150 1158 9 0.91150.3 0.4X -4 units w/ interval1333 1336 3 0.71333.5 0.3X -4 units w/o interval 1354 1359 4 0.71354.5 0.3X -5 units w/ interval1523 1525 2 0.71523.3 0.3X -5 units w/o interval 1549 1551 3 0.61549.4 0.3X -6 units w/ interval1661 1663 2 0.61660.8 0.3X -6 units w/o interval 1691 1704 13 0.61691.2 0.2X -7 units w/ interval1811 1817 8 0.61810.6 0.2X -7 units w/o interval 1853 1854 1 0.51853.2 0.2X -8 units w/ interval2029 2037 8 0.52028.7 0.2X -8 units w/o interval 2075 2075 1 0.52074.5 0.2X -9 units w/ interval2170 2175 5 0.52170.0 0.2X -9 units w/o interval 2204 2212 8 0.52203.6 0.2X +prepare string w/ interval 403419 18 2.5 403.1 1.0X +prepare string w/o interval 341353 21 2.9 341.1 1.2X +1 units w/ interval5154 5159 8 0.25153.5 0.1X +1 units w/o interval 4818 4833 20 0.24817.6 0.1X +2 units w/ interval6191 6223 41 0.26190.6 0.1X +2 units w/o interval 6236 6264 25 0.26235.7 0.1X +3 units w/ interval7397 7567 170 0.17397.0 0.1X +3 units w/o interval 7280 7367 76 0.17279.6 0.1X +4 units w/ interval8197 8228 27 0.18197.3 0.0X +4 units w/o interval 7977 7989 17 0.17977.3 0.1X +5 units w/ interval9089 9192 101 0.19088.8 0.0X +5 units w/o interval 8853 8858 5 0.18852.8 0.0X +6 units w/ interval9696 9720 23 0.19695.6 0.0X +6 units w/o interval 9509 9518 9 0.19509.4 0.0X +7 units w/ interval 10738
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r338205376 ## File path: sql/core/benchmarks/IntervalBenchmark-results.txt ## @@ -1,25 +1,25 @@ -Java HotSpot(TM) 64-Bit Server VM 1.8.0_202-b08 on Mac OS X 10.15 -Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz Review comment: Oh, the original benchmark result is on MacOS. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337953103 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala ## @@ -70,4 +71,10 @@ trait ParserInterface { */ @throws[ParseException]("Text cannot be parsed to a DataType") def parseDataType(sqlText: String): DataType + + /** + * Parse a string to a [[CalendarInterval]]. + */ + @throws[ParseException]("Text cannot be parsed to an interval") + def parseInterval(sqlText: String): CalendarInterval Review comment: Thanks! Of course, we can change it~ What we need is just having two separate PRs to distinguish them clearly. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337880333 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala ## @@ -70,4 +71,10 @@ trait ParserInterface { */ @throws[ParseException]("Text cannot be parsed to a DataType") def parseDataType(sqlText: String): DataType + + /** + * Parse a string to a [[CalendarInterval]]. + */ + @throws[ParseException]("Text cannot be parsed to an interval") + def parseInterval(sqlText: String): CalendarInterval Review comment: @cloud-fan . This looks like a breaking change to the existing `Parser` `Spark Extension`. If then, this need a new JIRA. Can we have this change in a separate JIRA and PR because this is an open developer API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337880333 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala ## @@ -70,4 +71,10 @@ trait ParserInterface { */ @throws[ParseException]("Text cannot be parsed to a DataType") def parseDataType(sqlText: String): DataType + + /** + * Parse a string to a [[CalendarInterval]]. + */ + @throws[ParseException]("Text cannot be parsed to an interval") + def parseInterval(sqlText: String): CalendarInterval Review comment: @cloud-fan . This looks like a breaking change to the existing `Parser` `Spark Extension`. If then, this need a new JIRA. Can we have this change in a separate JIRA PR because this is an open developer API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337880333 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParserInterface.scala ## @@ -70,4 +71,10 @@ trait ParserInterface { */ @throws[ParseException]("Text cannot be parsed to a DataType") def parseDataType(sqlText: String): DataType + + /** + * Parse a string to a [[CalendarInterval]]. + */ + @throws[ParseException]("Text cannot be parsed to an interval") + def parseInterval(sqlText: String): CalendarInterval Review comment: @cloud-fan . This looks like a breaking change to the existing `Parser` Spark Extension. If then, this need a new JIRA. Can we have this change in a separate JIRA PR because this is an open developer API? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337252065 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -79,6 +79,10 @@ singleTableSchema : colTypeList EOF ; +singleInterval +: INTERVAL? (intervalValue intervalUnit)+ EOF Review comment: Thank you for pinging me, @cloud-fan . It seems to be a corner case which we overlooked before. +1 for banning the case `SELECT INTERVAL 'INTERVAL 1 year'` in Apache Spark 3.0. What we need is only `SELECT INTERVAL '1 year'`. In other words, - `SELECT INTERVAL 'INTERVAL 1 year'` (X) - `SELECT INTERVAL '1 year'` (O) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337252065 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -79,6 +79,10 @@ singleTableSchema : colTypeList EOF ; +singleInterval +: INTERVAL? (intervalValue intervalUnit)+ EOF Review comment: Thank you for pinging me, @cloud-fan . It must be a corner case which we overlooked before. +1 for banning the case `SELECT INTERVAL 'INTERVAL 1 year'` in Apache Spark 3.0. What we need is only `SELECT INTERVAL '1 year'`. In other words, - `SELECT INTERVAL 'INTERVAL 1 year'` (X) - `SELECT INTERVAL '1 year'` (O) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing
dongjoon-hyun commented on a change in pull request #26190: [SPARK-29532][SQL] simplify interval string parsing URL: https://github.com/apache/spark/pull/26190#discussion_r337252065 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -79,6 +79,10 @@ singleTableSchema : colTypeList EOF ; +singleInterval +: INTERVAL? (intervalValue intervalUnit)+ EOF Review comment: Thank you for pinging me, @cloud-fan . It must be a corner case which we overlooked before. +1 for banning the case `SELECT INTERVAL 'INTERVAL 1 year'` in Apache Spark 3.0. What we need are only `SELECT INTERVAL '1 year'` and `SELECT 'INTERVAL 1 year'`. In other words, - `SELECT INTERVAL 'INTERVAL 1 year'` (X) - `SELECT INTERVAL '1 year'` (O) - `SELECT 'INTERVAL 1 year'` (O) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org