[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/18106 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r130837526 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, truncParam): --- End diff -- Yes but it brings complexity for both args and kwargs e.g., when both set, method signature in doc and etc. I wonder if it is that important. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r130833154 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, truncParam): --- End diff -- We can work around this with kwargs if it's important to change the name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r130801952 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, truncParam): --- End diff -- @wangyum, would you mind revert this renaming? This breaks the compatibility if user script calls this by ```python trunc(..., format= ...) trunc(date=..., format= ...) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r124329456 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, truncParam): --- End diff -- I believe this definitely breaks backward compatibility for keyword-argument usage in Python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r124328743 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, truncParam): --- End diff -- @ueshin @holdenk re: changing param name in python. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123920115 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); --- End diff -- Yes, This is what I worry about. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123919660 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ExpectsInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = dataType match { +case NullType => Seq(dataType, TypeCollection(StringType, IntegerType)) +case DateType => Seq(dataType, StringType) +case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType) +case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType), --- End diff -- Add this case to show all supported type: ``` > select trunc(false, 'MON'); Error in query: cannot resolve 'trunc(false, 'MON')' due to data type mismatch: argument 1 requires (date or double or decimal) type, however, 'false' is of boolean type.; line 1 pos 7; 'Project [unresolvedalias(trunc(false, MON), None)] +- OneRowRelation$ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123884847 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, format): """ -Returns date truncated to the unit specified by the format. +Returns date truncated to the unit specified by the format or +number truncated by specified decimal places. :param format: 'year', '', 'yy' or 'month', 'mon', 'mm' >>> df = spark.createDataFrame([('1997-02-28',)], ['d']) ->>> df.select(trunc(df.d, 'year').alias('year')).collect() +>>> df.select(trunc(to_date(df.d), 'year').alias('year')).collect() [Row(year=datetime.date(1997, 1, 1))] ->>> df.select(trunc(df.d, 'mon').alias('month')).collect() +>>> df.select(trunc(to_date(df.d), 'mon').alias('month')).collect() --- End diff -- this, could be a bigger problem :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123884830 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1382,8 +1382,8 @@ test_that("column functions", { c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "") c21 <- posexplode_outer(c) + explode_outer(c) c22 <- not(c) - c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") + -trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm") + c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + trunc(to_date(c), "yy") + +trunc(to_date(c), "month") + trunc(to_date(c), "mon") + trunc(to_date(c), "mm") --- End diff -- that's a good point. fortunately (?) trunc was only added to R in 2.3.0, so I think we need to make sure (manually, add unit test) that trunc works on date columns and numeric columns --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123871837 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala --- @@ -44,4 +46,49 @@ class MiscExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(evaluate(Uuid()) !== evaluate(Uuid())) } + test("trunc") { +// numeric +def testTruncNumber(input: Double, fmt: Int, expected: Double): Unit = { + checkEvaluation(Trunc(Literal.create(input, DoubleType), +Literal.create(fmt, IntegerType)), +expected) + checkEvaluation(Trunc(Literal.create(input, DoubleType), +NonFoldableLiteral.create(fmt, IntegerType)), +expected) +} + +testTruncNumber(1234567891.1234567891, 4, 1234567891.1234) +testTruncNumber(1234567891.1234567891, -4, 123456) +testTruncNumber(1234567891.1234567891, 0, 1234567891) --- End diff -- Also check testTruncNumber(0.1234567891, -1, 0.1234567891)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123871820 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala --- @@ -44,4 +46,49 @@ class MiscExpressionsSuite extends SparkFunSuite with ExpressionEvalHelper { assert(evaluate(Uuid()) !== evaluate(Uuid())) } + test("trunc") { +// numeric +def testTruncNumber(input: Double, fmt: Int, expected: Double): Unit = { + checkEvaluation(Trunc(Literal.create(input, DoubleType), +Literal.create(fmt, IntegerType)), +expected) + checkEvaluation(Trunc(Literal.create(input, DoubleType), +NonFoldableLiteral.create(fmt, IntegerType)), +expected) +} + +testTruncNumber(1234567891.1234567891, 4, 1234567891.1234) +testTruncNumber(1234567891.1234567891, -4, 123456) +testTruncNumber(1234567891.1234567891, 0, 1234567891) + +checkEvaluation(Trunc(Literal.create(1D, DoubleType), + NonFoldableLiteral.create(null, IntegerType)), + null) +checkEvaluation(Trunc(Literal.create(null, DoubleType), + NonFoldableLiteral.create(1, IntegerType)), + null) +checkEvaluation(Trunc(Literal.create(null, DoubleType), + NonFoldableLiteral.create(null, IntegerType)), + null) + +// date --- End diff -- Shall we split this test into two tests for numeric and date respectively? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123831200 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); --- End diff -- I guess this also drops the support of other types (e.g., timestamp) basically as we don't allow implicit cast (e.g, `SELECT trunc(timestamp('2009-02-12'), 'MM')`) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123831034 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. --- End diff -- Also, I would describe the types without class names (e.g., date). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123828403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2067,6 +2067,18 @@ object functions { */ def radians(columnName: String): Column = radians(Column(columnName)) + /** + * returns number truncated by specified decimal places. --- End diff -- little nit: `returns` -> `Returns` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123828339 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2067,6 +2067,18 @@ object functions { */ def radians(columnName: String): Column = radians(Column(columnName)) + /** + * returns number truncated by specified decimal places. + * + * @param scale: 4. -4, 0 --- End diff -- Let's describe the param. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123827347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2067,6 +2067,18 @@ object functions { */ def radians(columnName: String): Column = radians(Column(columnName)) + /** + * returns number truncated by specified decimal places. + * + * @param scale: 4. -4, 0 + * + * @group math_funcs + * @since 2.3.0 + */ + def trunc(db: Column, scale: Int = 0): Column = withExpr { --- End diff -- I would avoid using default value here unless we are very sure on this (e.g., other languages can call this without any problem, this guarantees method signature and does not break binary compatibility and etc.). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123824736 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, format): """ -Returns date truncated to the unit specified by the format. +Returns date truncated to the unit specified by the format or +number truncated by specified decimal places. :param format: 'year', '', 'yy' or 'month', 'mon', 'mm' >>> df = spark.createDataFrame([('1997-02-28',)], ['d']) ->>> df.select(trunc(df.d, 'year').alias('year')).collect() +>>> df.select(trunc(to_date(df.d), 'year').alias('year')).collect() [Row(year=datetime.date(1997, 1, 1))] ->>> df.select(trunc(df.d, 'mon').alias('month')).collect() +>>> df.select(trunc(to_date(df.d), 'mon').alias('month')).collect() --- End diff -- Here too. If this change is needed to pass Pyhon tests, it looks breaking the compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123822928 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1382,8 +1382,8 @@ test_that("column functions", { c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "") c21 <- posexplode_outer(c) + explode_outer(c) c22 <- not(c) - c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") + -trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm") + c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + trunc(to_date(c), "yy") + +trunc(to_date(c), "month") + trunc(to_date(c), "mon") + trunc(to_date(c), "mm") --- End diff -- Ah, just to be clear, I meant https://github.com/apache/spark/pull/18106/files/7fee61b1e084a1ae9966e7ad62b1509085b24151#r123758795The current state looks drop supporting this function with string that can be implicitly casted into date and `to_date` is required which breaks backward compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123798230 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1382,8 +1382,8 @@ test_that("column functions", { c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "") c21 <- posexplode_outer(c) + explode_outer(c) c22 <- not(c) - c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") + -trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm") + c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + trunc(to_date(c), "yy") + +trunc(to_date(c), "month") + trunc(to_date(c), "mon") + trunc(to_date(c), "mm") --- End diff -- R should support both date and number too --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123759792 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1382,8 +1382,8 @@ test_that("column functions", { c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "") c21 <- posexplode_outer(c) + explode_outer(c) c22 <- not(c) - c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") + -trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm") + c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + trunc(to_date(c), "yy") + +trunc(to_date(c), "month") + trunc(to_date(c), "mon") + trunc(to_date(c), "mm") --- End diff -- y. I don't think we should do this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123759339 --- Diff: R/pkg/tests/fulltests/test_sparkSQL.R --- @@ -1382,8 +1382,8 @@ test_that("column functions", { c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "") c21 <- posexplode_outer(c) + explode_outer(c) c22 <- not(c) - c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") + -trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm") + c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + trunc(to_date(c), "yy") + +trunc(to_date(c), "month") + trunc(to_date(c), "mon") + trunc(to_date(c), "mm") --- End diff -- Isn't this actually behaviour change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123758795 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); --- End diff -- And I don't think we should drop this support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123758531 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); --- End diff -- As it doesn't extends `ImplicitCastInputTypes` anymore, I think you can't directly use string as date parameter? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123757175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ExpectsInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = dataType match { +case NullType => Seq(dataType, TypeCollection(StringType, IntegerType)) +case DateType => Seq(dataType, StringType) +case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType) +case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType), + TypeCollection(StringType, IntegerType)) + } + + override def nullable: Boolean = true + + override def prettyName: String = "trunc" + + private val isTruncNumber = +(dataType.isInstanceOf[DoubleType] || dataType.isInstanceOf[DecimalType]) && + format.dataType.isInstanceOf[IntegerType] --- End diff -- Combined with this and `inputTypes`, once we have input types like (DoubleType, StringType), looks like `eval` simply return null. Seems to me it's not considered as correct result. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123756528 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ExpectsInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = dataType match { +case NullType => Seq(dataType, TypeCollection(StringType, IntegerType)) +case DateType => Seq(dataType, StringType) +case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType) +case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType), + TypeCollection(StringType, IntegerType)) + } + + override def nullable: Boolean = true + + override def prettyName: String = "trunc" + + private val isTruncNumber = +(dataType.isInstanceOf[DoubleType] || dataType.isInstanceOf[DecimalType]) && + format.dataType.isInstanceOf[IntegerType] + private val isTruncDate = +dataType.isInstanceOf[DateType] && format.dataType.isInstanceOf[StringType] + + private lazy val truncFormat: Int = if (isTruncNumber) { +format.eval().asInstanceOf[Int] + } else if (isTruncDate) { +DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String]) + } else { +0 + } + + override def eval(input: InternalRow): Any = { --- End diff -- override `nullSafeEval`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123755745 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ExpectsInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = dataType match { +case NullType => Seq(dataType, TypeCollection(StringType, IntegerType)) +case DateType => Seq(dataType, StringType) +case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType) +case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType), --- End diff -- Do we need to have this case? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123754453 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. --- End diff -- Please also describe default values (MM or 0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123753748 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. --- End diff -- fmt -> expr? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123752789 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) --- End diff -- truncParam? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123752317 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, format): """ -Returns date truncated to the unit specified by the format. +Returns date truncated to the unit specified by the format or +number truncated by specified decimal places. :param format: 'year', '', 'yy' or 'month', 'mon', 'mm' --- End diff -- And we need to update this param doc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123684235 --- Diff: python/pyspark/sql/functions.py --- @@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None): @since(1.5) -def trunc(date, format): +def trunc(data, format): """ -Returns date truncated to the unit specified by the format. +Returns date truncated to the unit specified by the format or +number truncated by specified decimal places. --- End diff -- in the latter case where `data` is a number, the 2nd parameter as called `format` seems a bit out of place? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123190922 --- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql --- @@ -92,3 +91,7 @@ select abs(-3.13), abs('-2.19'); -- positive/negative select positive('-1.11'), positive(-1.11), negative('-1.11'), negative(-1.11); + +-- trunc number +select trunc(1234567891.1234567891, 4), trunc(1234567891.1234567891, -4), trunc(1234567891.1234567891, 0), trunc(1234567891.1234567891); +select trunc(1234567891.1234567891, null), trunc(null, 4), trunc(null, null); --- End diff -- Can you add test cases like `trunc(1234567891.1234567891, "")` and `trunc('2015-07-22', 4)`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123161910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,141 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + val isTruncNumber = format.dataType.isInstanceOf[IntegerType] + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = +Seq(TypeCollection(DateType, DoubleType, DecimalType), + TypeCollection(StringType, IntegerType)) --- End diff -- If we are going to have only `trunc` for truncating number and datetime. We should prevent wrong input types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18106#discussion_r123155483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala --- @@ -132,3 +133,141 @@ case class Uuid() extends LeafExpression { s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", isNull = "false") } } + +/** + * Returns date truncated to the unit specified by the format or + * numeric truncated to scale decimal places. + */ +// scalastyle:off line.size.limit +@ExpressionDescription( + usage = """ + _FUNC_(data[, fmt]) - Returns `data` truncated by the format model `fmt`. +If `data` is DateType, returns `data` with the time portion of the day truncated to the unit specified by the format model `fmt`. +If `data` is DecimalType/DoubleType, returns `data` truncated to `fmt` decimal places. + """, + extended = """ +Examples: + > SELECT _FUNC_('2009-02-12', 'MM'); + 2009-02-01. + > SELECT _FUNC_('2015-10-27', 'YEAR'); + 2015-01-01 + > SELECT _FUNC_('1989-03-13'); + 1989-03-01 + > SELECT _FUNC_(1234567891.1234567891, 4); + 1234567891.1234 + > SELECT _FUNC_(1234567891.1234567891, -4); + 123456 + > SELECT _FUNC_(1234567891.1234567891); + 1234567891 + """) +// scalastyle:on line.size.limit +case class Trunc(data: Expression, format: Expression) + extends BinaryExpression with ImplicitCastInputTypes { + + def this(data: Expression) = { +this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 0)) + } + + override def left: Expression = data + override def right: Expression = format + + val isTruncNumber = format.dataType.isInstanceOf[IntegerType] + + override def dataType: DataType = data.dataType + + override def inputTypes: Seq[AbstractDataType] = +Seq(TypeCollection(DateType, DoubleType, DecimalType), + TypeCollection(StringType, IntegerType)) --- End diff -- I think this might lead to wrong input types combinations such as (DoubleType, StringType) and (DateType, IntegerType)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/18106 [SPARK-20754][SQL] Support TRUNC (number) ## What changes were proposed in this pull request? Add support for `TRUNC(number)`, it's similar to Oracle [TRUNC(number)](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm): ```sql > SELECT TRUNC(1234567891.1234567891, 4); 1.2345678911234E9 > SELECT TRUNC(1234567891.1234567891, -4); 1.23456E9 > SELECT TRUNC(1234567891.1234567891, 0); 1.234567891E9 > SELECT TRUNC(1234567891.1234567891); 1.234567891E9 ``` ## How was this patch tested? unit tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-20754-trunc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/18106.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #18106 commit a5ade70afe7601db16ec24956f270feb4499ee42 Author: Yuming WangDate: 2017-05-25T10:21:15Z Support TRUNC (number) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org