[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2018-09-13 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/18106


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r130837526
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, truncParam):
--- End diff --

Yes but it brings complexity for both args and kwargs e.g., when both set, 
method signature in doc and etc. I wonder if it is that important. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-08-02 Thread holdenk
Github user holdenk commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r130833154
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, truncParam):
--- End diff --

We can work around this with kwargs if it's important to change the name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-08-02 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r130801952
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, truncParam):
--- End diff --

@wangyum, would you mind revert this renaming? This breaks the 
compatibility if user script calls this by

```python
trunc(..., format= ...)
trunc(date=..., format= ...)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r124329456
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, truncParam):
--- End diff --

I believe this definitely breaks backward compatibility for 
keyword-argument usage in Python.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-27 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r124328743
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,29 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, truncParam):
--- End diff --

@ueshin @holdenk re: changing param name in python. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123920115
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
--- End diff --

Yes,  This is what I worry about.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-25 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123919660
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = dataType match {
+case NullType => Seq(dataType, TypeCollection(StringType, IntegerType))
+case DateType => Seq(dataType, StringType)
+case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType)
+case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType),
--- End diff --

Add this case to show all supported type:
```
 > select trunc(false, 'MON'); 
Error in query: cannot resolve 'trunc(false, 'MON')' due to data type 
mismatch: argument 1 requires (date or double or decimal) type, however, 
'false' is of boolean type.; line 1 pos 7;
'Project [unresolvedalias(trunc(false, MON), None)]
+- OneRowRelation$
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123884847
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, format):
 """
-Returns date truncated to the unit specified by the format.
+Returns date truncated to the unit specified by the format or
+number truncated by specified decimal places.
 
 :param format: 'year', '', 'yy' or 'month', 'mon', 'mm'
 
 >>> df = spark.createDataFrame([('1997-02-28',)], ['d'])
->>> df.select(trunc(df.d, 'year').alias('year')).collect()
+>>> df.select(trunc(to_date(df.d), 'year').alias('year')).collect()
 [Row(year=datetime.date(1997, 1, 1))]
->>> df.select(trunc(df.d, 'mon').alias('month')).collect()
+>>> df.select(trunc(to_date(df.d), 'mon').alias('month')).collect()
--- End diff --

this, could be a bigger problem :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-24 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123884830
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1382,8 +1382,8 @@ test_that("column functions", {
   c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "")
   c21 <- posexplode_outer(c) + explode_outer(c)
   c22 <- not(c)
-  c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") +
-trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm")
+  c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + 
trunc(to_date(c), "yy") +
+trunc(to_date(c), "month") + trunc(to_date(c), "mon") + 
trunc(to_date(c), "mm")
--- End diff --

that's a good point. fortunately (?) trunc was only added to R in 2.3.0, so 
I think we need to make sure (manually, add unit test) that trunc works on date 
columns and numeric columns


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-24 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123871837
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala
 ---
@@ -44,4 +46,49 @@ class MiscExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(evaluate(Uuid()) !== evaluate(Uuid()))
   }
 
+  test("trunc") {
+// numeric
+def testTruncNumber(input: Double, fmt: Int, expected: Double): Unit = 
{
+  checkEvaluation(Trunc(Literal.create(input, DoubleType),
+Literal.create(fmt, IntegerType)),
+expected)
+  checkEvaluation(Trunc(Literal.create(input, DoubleType),
+NonFoldableLiteral.create(fmt, IntegerType)),
+expected)
+}
+
+testTruncNumber(1234567891.1234567891, 4, 1234567891.1234)
+testTruncNumber(1234567891.1234567891, -4, 123456)
+testTruncNumber(1234567891.1234567891, 0, 1234567891)
--- End diff --

Also check testTruncNumber(0.1234567891, -1, 0.1234567891)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-24 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123871820
  
--- Diff: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala
 ---
@@ -44,4 +46,49 @@ class MiscExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(evaluate(Uuid()) !== evaluate(Uuid()))
   }
 
+  test("trunc") {
+// numeric
+def testTruncNumber(input: Double, fmt: Int, expected: Double): Unit = 
{
+  checkEvaluation(Trunc(Literal.create(input, DoubleType),
+Literal.create(fmt, IntegerType)),
+expected)
+  checkEvaluation(Trunc(Literal.create(input, DoubleType),
+NonFoldableLiteral.create(fmt, IntegerType)),
+expected)
+}
+
+testTruncNumber(1234567891.1234567891, 4, 1234567891.1234)
+testTruncNumber(1234567891.1234567891, -4, 123456)
+testTruncNumber(1234567891.1234567891, 0, 1234567891)
+
+checkEvaluation(Trunc(Literal.create(1D, DoubleType),
+  NonFoldableLiteral.create(null, IntegerType)),
+  null)
+checkEvaluation(Trunc(Literal.create(null, DoubleType),
+  NonFoldableLiteral.create(1, IntegerType)),
+  null)
+checkEvaluation(Trunc(Literal.create(null, DoubleType),
+  NonFoldableLiteral.create(null, IntegerType)),
+  null)
+
+// date
--- End diff --

Shall we split this test into two tests for numeric and date respectively?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123831200
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
--- End diff --

I guess this also drops the support of other types (e.g., timestamp) 
basically as we don't allow implicit cast (e.g, `SELECT 
trunc(timestamp('2009-02-12'), 'MM')`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123831034
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
--- End diff --

Also, I would describe the types without class names (e.g., date).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123828403
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2067,6 +2067,18 @@ object functions {
*/
   def radians(columnName: String): Column = radians(Column(columnName))
 
+  /**
+   * returns number truncated by specified decimal places.
--- End diff --

little nit: `returns` -> `Returns`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123828339
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2067,6 +2067,18 @@ object functions {
*/
   def radians(columnName: String): Column = radians(Column(columnName))
 
+  /**
+   * returns number truncated by specified decimal places.
+   *
+   * @param scale: 4. -4, 0
--- End diff --

Let's describe the param.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123827347
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2067,6 +2067,18 @@ object functions {
*/
   def radians(columnName: String): Column = radians(Column(columnName))
 
+  /**
+   * returns number truncated by specified decimal places.
+   *
+   * @param scale: 4. -4, 0
+   *
+   * @group math_funcs
+   * @since 2.3.0
+   */
+  def trunc(db: Column, scale: Int = 0): Column = withExpr {
--- End diff --

I would avoid using default value here unless we are very sure on this 
(e.g., other languages can call this without any problem, this guarantees 
method signature and does not break binary compatibility and etc.).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123824736
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, format):
 """
-Returns date truncated to the unit specified by the format.
+Returns date truncated to the unit specified by the format or
+number truncated by specified decimal places.
 
 :param format: 'year', '', 'yy' or 'month', 'mon', 'mm'
 
 >>> df = spark.createDataFrame([('1997-02-28',)], ['d'])
->>> df.select(trunc(df.d, 'year').alias('year')).collect()
+>>> df.select(trunc(to_date(df.d), 'year').alias('year')).collect()
 [Row(year=datetime.date(1997, 1, 1))]
->>> df.select(trunc(df.d, 'mon').alias('month')).collect()
+>>> df.select(trunc(to_date(df.d), 'mon').alias('month')).collect()
--- End diff --

Here too. If this change is needed to pass Pyhon tests, it looks breaking 
the compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123822928
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1382,8 +1382,8 @@ test_that("column functions", {
   c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "")
   c21 <- posexplode_outer(c) + explode_outer(c)
   c22 <- not(c)
-  c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") +
-trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm")
+  c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + 
trunc(to_date(c), "yy") +
+trunc(to_date(c), "month") + trunc(to_date(c), "mon") + 
trunc(to_date(c), "mm")
--- End diff --

Ah, just to be clear, I meant 
https://github.com/apache/spark/pull/18106/files/7fee61b1e084a1ae9966e7ad62b1509085b24151#r123758795The
 current state looks drop supporting this function with string that can be 
implicitly casted into date and `to_date` is required which breaks backward 
compatibility.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123798230
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1382,8 +1382,8 @@ test_that("column functions", {
   c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "")
   c21 <- posexplode_outer(c) + explode_outer(c)
   c22 <- not(c)
-  c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") +
-trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm")
+  c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + 
trunc(to_date(c), "yy") +
+trunc(to_date(c), "month") + trunc(to_date(c), "mon") + 
trunc(to_date(c), "mm")
--- End diff --

R should support both date and number too


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123759792
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1382,8 +1382,8 @@ test_that("column functions", {
   c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "")
   c21 <- posexplode_outer(c) + explode_outer(c)
   c22 <- not(c)
-  c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") +
-trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm")
+  c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + 
trunc(to_date(c), "yy") +
+trunc(to_date(c), "month") + trunc(to_date(c), "mon") + 
trunc(to_date(c), "mm")
--- End diff --

y. I don't think we should do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123759339
  
--- Diff: R/pkg/tests/fulltests/test_sparkSQL.R ---
@@ -1382,8 +1382,8 @@ test_that("column functions", {
   c20 <- to_timestamp(c) + to_timestamp(c, "") + to_date(c, "")
   c21 <- posexplode_outer(c) + explode_outer(c)
   c22 <- not(c)
-  c23 <- trunc(c, "year") + trunc(c, "") + trunc(c, "yy") +
-trunc(c, "month") + trunc(c, "mon") + trunc(c, "mm")
+  c23 <- trunc(to_date(c), "year") + trunc(to_date(c), "") + 
trunc(to_date(c), "yy") +
+trunc(to_date(c), "month") + trunc(to_date(c), "mon") + 
trunc(to_date(c), "mm")
--- End diff --

Isn't this actually behaviour change?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123758795
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
--- End diff --

And I don't think we should drop this support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123758531
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
--- End diff --

As it doesn't extends `ImplicitCastInputTypes` anymore, I think you can't 
directly use string as date parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123757175
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = dataType match {
+case NullType => Seq(dataType, TypeCollection(StringType, IntegerType))
+case DateType => Seq(dataType, StringType)
+case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType)
+case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType),
+  TypeCollection(StringType, IntegerType))
+  }
+
+  override def nullable: Boolean = true
+
+  override def prettyName: String = "trunc"
+
+  private val isTruncNumber =
+(dataType.isInstanceOf[DoubleType] || 
dataType.isInstanceOf[DecimalType]) &&
+  format.dataType.isInstanceOf[IntegerType]
--- End diff --

Combined with this and `inputTypes`, once we have input types like 
(DoubleType, StringType), looks like `eval` simply return null. Seems to me 
it's not considered as correct result.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123756528
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = dataType match {
+case NullType => Seq(dataType, TypeCollection(StringType, IntegerType))
+case DateType => Seq(dataType, StringType)
+case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType)
+case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType),
+  TypeCollection(StringType, IntegerType))
+  }
+
+  override def nullable: Boolean = true
+
+  override def prettyName: String = "trunc"
+
+  private val isTruncNumber =
+(dataType.isInstanceOf[DoubleType] || 
dataType.isInstanceOf[DecimalType]) &&
+  format.dataType.isInstanceOf[IntegerType]
+  private val isTruncDate =
+dataType.isInstanceOf[DateType] && 
format.dataType.isInstanceOf[StringType]
+
+  private lazy val truncFormat: Int = if (isTruncNumber) {
+format.eval().asInstanceOf[Int]
+  } else if (isTruncDate) {
+DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
+  } else {
+0
+  }
+
+  override def eval(input: InternalRow): Any = {
--- End diff --

override `nullSafeEval`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123755745
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ExpectsInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] = dataType match {
+case NullType => Seq(dataType, TypeCollection(StringType, IntegerType))
+case DateType => Seq(dataType, StringType)
+case DoubleType | DecimalType.Fixed(_, _) => Seq(dataType, IntegerType)
+case _ => Seq(TypeCollection(DateType, DoubleType, DecimalType),
--- End diff --

Do we need to have this case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123754453
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
--- End diff --

Please also describe default values (MM or 0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123753748
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
--- End diff --

fmt -> expr?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123752789
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,154 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
--- End diff --

truncParam?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123752317
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, format):
 """
-Returns date truncated to the unit specified by the format.
+Returns date truncated to the unit specified by the format or
+number truncated by specified decimal places.
 
 :param format: 'year', '', 'yy' or 'month', 'mon', 'mm'
--- End diff --

And we need to update this param doc.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-23 Thread felixcheung
Github user felixcheung commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123684235
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -1028,20 +1028,28 @@ def to_timestamp(col, format=None):
 
 
 @since(1.5)
-def trunc(date, format):
+def trunc(data, format):
 """
-Returns date truncated to the unit specified by the format.
+Returns date truncated to the unit specified by the format or
+number truncated by specified decimal places.
--- End diff --

in the latter case where `data` is a number, the 2nd parameter as called 
`format` seems a bit out of place?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123190922
  
--- Diff: sql/core/src/test/resources/sql-tests/inputs/operators.sql ---
@@ -92,3 +91,7 @@ select abs(-3.13), abs('-2.19');
 
 -- positive/negative
 select positive('-1.11'), positive(-1.11), negative('-1.11'), 
negative(-1.11);
+
+-- trunc number
+select trunc(1234567891.1234567891, 4), trunc(1234567891.1234567891, -4), 
trunc(1234567891.1234567891, 0), trunc(1234567891.1234567891);
+select trunc(1234567891.1234567891, null), trunc(null, 4), trunc(null, 
null);
--- End diff --

Can you add test cases like `trunc(1234567891.1234567891, "")` and 
`trunc('2015-07-22', 4)`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-21 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123161910
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,141 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  val isTruncNumber = format.dataType.isInstanceOf[IntegerType]
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(TypeCollection(DateType, DoubleType, DecimalType),
+  TypeCollection(StringType, IntegerType))
--- End diff --

If we are going to have only `trunc` for truncating number and datetime. We 
should prevent wrong input types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-06-20 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/18106#discussion_r123155483
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ---
@@ -132,3 +133,141 @@ case class Uuid() extends LeafExpression {
   s"UTF8String.fromString(java.util.UUID.randomUUID().toString());", 
isNull = "false")
   }
 }
+
+/**
+ * Returns date truncated to the unit specified by the format or
+ * numeric truncated to scale decimal places.
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+  _FUNC_(data[, fmt]) - Returns `data` truncated by the format model 
`fmt`.
+If `data` is DateType, returns `data` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.
+If `data` is DecimalType/DoubleType, returns `data` truncated to 
`fmt` decimal places.
+  """,
+  extended = """
+Examples:
+  > SELECT _FUNC_('2009-02-12', 'MM');
+   2009-02-01.
+  > SELECT _FUNC_('2015-10-27', 'YEAR');
+   2015-01-01
+  > SELECT _FUNC_('1989-03-13');
+   1989-03-01
+  > SELECT _FUNC_(1234567891.1234567891, 4);
+   1234567891.1234
+  > SELECT _FUNC_(1234567891.1234567891, -4);
+   123456
+  > SELECT _FUNC_(1234567891.1234567891);
+   1234567891
+  """)
+// scalastyle:on line.size.limit
+case class Trunc(data: Expression, format: Expression)
+  extends BinaryExpression with ImplicitCastInputTypes {
+
+  def this(data: Expression) = {
+this(data, Literal(if (data.dataType.isInstanceOf[DateType]) "MM" else 
0))
+  }
+
+  override def left: Expression = data
+  override def right: Expression = format
+
+  val isTruncNumber = format.dataType.isInstanceOf[IntegerType]
+
+  override def dataType: DataType = data.dataType
+
+  override def inputTypes: Seq[AbstractDataType] =
+Seq(TypeCollection(DateType, DoubleType, DecimalType),
+  TypeCollection(StringType, IntegerType))
--- End diff --

I think this might lead to wrong input types combinations such as 
(DoubleType, StringType) and (DateType, IntegerType)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18106: [SPARK-20754][SQL] Support TRUNC (number)

2017-05-25 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/18106

[SPARK-20754][SQL] Support TRUNC (number)

## What changes were proposed in this pull request?

Add support for `TRUNC(number)`, it's similar to Oracle 
[TRUNC(number)](https://docs.oracle.com/cd/B19306_01/server.102/b14200/functions200.htm):

```sql
> SELECT TRUNC(1234567891.1234567891, 4);
 1.2345678911234E9
> SELECT TRUNC(1234567891.1234567891, -4);
 1.23456E9
> SELECT TRUNC(1234567891.1234567891, 0);
 1.234567891E9
> SELECT TRUNC(1234567891.1234567891);
 1.234567891E9
```

## How was this patch tested?
unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-20754-trunc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18106.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18106


commit a5ade70afe7601db16ec24956f270feb4499ee42
Author: Yuming Wang 
Date:   2017-05-25T10:21:15Z

Support TRUNC (number)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org