[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-18 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r157683878
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala
 ---
@@ -126,18 +121,14 @@ class ArrowPythonRunner(
   private var schema: StructType = _
   private var vectors: Array[ColumnVector] = _
 
-  private var closed = false
-
   context.addTaskCompletionListener { _ =>
 // todo: we need something like `reader.end()`, which release all 
the resources, but leave
--- End diff --

@ueshin is this still a valid todo?  It is possible in Arrow now by calling 
`reader.close(false)` where the `false` flag will free resources but keep the 
stream open.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-18 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r157683384
  
--- Diff: python/pyspark/sql/types.py ---
@@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
  for field in arrow_schema])
 
 
+def _require_minimum_pyarrow_version():
--- End diff --

@ueshin @HyukjinKwon just want to verify that we want to check for a 
minimum version of pyarrow and this is the right place to put this function?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryErr...

2017-12-18 Thread sitalkedia
Github user sitalkedia commented on a diff in the pull request:

https://github.com/apache/spark/pull/20014#discussion_r157682902
  
--- Diff: 
core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.memory;
+
+/**
+ * This exception is thrown when a task can not acquire memory from the 
Memory manager.
+ * Instead of throwing {@link OutOfMemoryError}, which kills the executor,
+ * we should use throw this exception, which will just kill the current 
task.
+ */
+public final class SparkOutOfMemoryError extends OutOfMemoryError {
--- End diff --

Yes, this is an internal class. How do you suggest to label it ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19884
  
**[Test build #85099 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85099/testReport)**
 for PR 19884 at commit 
[`22c6b92`](https://github.com/apache/spark/commit/22c6b92fc6ab31c332715c9372cfc1fe27835ccb).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20016: SPARK-22830 Scala Coding style has been improved in Spar...

2017-12-18 Thread chetkhatri
Github user chetkhatri commented on the issue:

https://github.com/apache/spark/pull/20016
  
+ @holdenk @sameeragarwal 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20016: SPARK-22830 Scala Coding style has been improved in Spar...

2017-12-18 Thread chetkhatri
Github user chetkhatri commented on the issue:

https://github.com/apache/spark/pull/20016
  
@HyukjinKwon I agree with you ! But since Spark is scala project - A lot's 
of developers refer examples available here and if those are Java Developers 
they might don't understand that this is right way to do in Scala ! I had a 
same discussion in Scala Days with developers and I think it does make sense 
here too.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157676669
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1295,87 +1295,184 @@ case class ParseToTimestamp(left: Expression, 
format: Option[Expression], child:
   override def dataType: DataType = TimestampType
 }
 
-/**
- * Returns date truncated to the unit specified by the format.
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "_FUNC_(date, fmt) - Returns `date` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.",
-  examples = """
-Examples:
-  > SELECT _FUNC_('2009-02-12', 'MM');
-   2009-02-01
-  > SELECT _FUNC_('2015-10-27', 'YEAR');
-   2015-01-01
-  """,
-  since = "1.5.0")
-// scalastyle:on line.size.limit
-case class TruncDate(date: Expression, format: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
-  override def left: Expression = date
-  override def right: Expression = format
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, 
StringType)
-  override def dataType: DataType = DateType
+trait TruncTime extends BinaryExpression with ImplicitCastInputTypes {
+  val time: Expression
+  val format: Expression
   override def nullable: Boolean = true
-  override def prettyName: String = "trunc"
 
   private lazy val truncLevel: Int =
 DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 
-  override def eval(input: InternalRow): Any = {
+  /**
+   *
+   * @param input
+   * @param maxLevel Maximum level that can be used for truncation (e.g 
MONTH for Date input)
+   * @param truncFunc
+   * @tparam T
+   * @return
+   */
+  protected def evalHelper[T](input: InternalRow, maxLevel: Int)(
+truncFunc: (Any, Int) => T): Any = {
 val level = if (format.foldable) {
   truncLevel
 } else {
   DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 }
-if (level == -1) {
+if (level == DateTimeUtils.TRUNC_INVALID || level > maxLevel) {
--- End diff --

`// unknown format or too small level`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157678588
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1295,87 +1295,184 @@ case class ParseToTimestamp(left: Expression, 
format: Option[Expression], child:
   override def dataType: DataType = TimestampType
 }
 
-/**
- * Returns date truncated to the unit specified by the format.
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "_FUNC_(date, fmt) - Returns `date` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.",
-  examples = """
-Examples:
-  > SELECT _FUNC_('2009-02-12', 'MM');
-   2009-02-01
-  > SELECT _FUNC_('2015-10-27', 'YEAR');
-   2015-01-01
-  """,
-  since = "1.5.0")
-// scalastyle:on line.size.limit
-case class TruncDate(date: Expression, format: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
-  override def left: Expression = date
-  override def right: Expression = format
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, 
StringType)
-  override def dataType: DataType = DateType
+trait TruncTime extends BinaryExpression with ImplicitCastInputTypes {
+  val time: Expression
+  val format: Expression
   override def nullable: Boolean = true
-  override def prettyName: String = "trunc"
 
   private lazy val truncLevel: Int =
 DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 
-  override def eval(input: InternalRow): Any = {
+  /**
+   *
+   * @param input
+   * @param maxLevel Maximum level that can be used for truncation (e.g 
MONTH for Date input)
+   * @param truncFunc
+   * @tparam T
+   * @return
+   */
+  protected def evalHelper[T](input: InternalRow, maxLevel: Int)(
+truncFunc: (Any, Int) => T): Any = {
 val level = if (format.foldable) {
   truncLevel
 } else {
   DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 }
-if (level == -1) {
+if (level == DateTimeUtils.TRUNC_INVALID || level > maxLevel) {
   // unknown format
   null
 } else {
-  val d = date.eval(input)
+  val d = time.eval(input)
   if (d == null) {
 null
   } else {
-DateTimeUtils.truncDate(d.asInstanceOf[Int], level)
+truncFunc(d, level)
   }
 }
   }
 
-  override def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
+  protected def codeGenHelper[T](
--- End diff --

Why do we need a type parameter `T`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157680290
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ---
@@ -944,9 +954,16 @@ object DateTimeUtils {
 date + daysToMonthEnd
   }
 
-  private val TRUNC_TO_YEAR = 1
-  private val TRUNC_TO_MONTH = 2
-  private val TRUNC_INVALID = -1
+  // Visible for testing.
+  val TRUNC_TO_YEAR = 1
+  val TRUNC_TO_MONTH = 2
+  val TRUNC_TO_DAY = 3
+  val TRUNC_TO_HOUR = 4
+  val TRUNC_TO_MINUTE = 5
+  val TRUNC_TO_SECOND = 6
+  val TRUNC_TO_WEEK = 7
+  val TRUNC_TO_QUARTER = 8
+  val TRUNC_INVALID = -1
--- End diff --

Can we bring quarter and week forward, maybe to 3 and 4? Then it's more 
conform to the order of time granularity and max-level design is not influenced.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread gczsjdy
Github user gczsjdy commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157674840
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1295,87 +1295,184 @@ case class ParseToTimestamp(left: Expression, 
format: Option[Expression], child:
   override def dataType: DataType = TimestampType
 }
 
-/**
- * Returns date truncated to the unit specified by the format.
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "_FUNC_(date, fmt) - Returns `date` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.",
-  examples = """
-Examples:
-  > SELECT _FUNC_('2009-02-12', 'MM');
-   2009-02-01
-  > SELECT _FUNC_('2015-10-27', 'YEAR');
-   2015-01-01
-  """,
-  since = "1.5.0")
-// scalastyle:on line.size.limit
-case class TruncDate(date: Expression, format: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
-  override def left: Expression = date
-  override def right: Expression = format
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, 
StringType)
-  override def dataType: DataType = DateType
+trait TruncTime extends BinaryExpression with ImplicitCastInputTypes {
+  val time: Expression
+  val format: Expression
   override def nullable: Boolean = true
-  override def prettyName: String = "trunc"
 
   private lazy val truncLevel: Int =
 DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 
-  override def eval(input: InternalRow): Any = {
+  /**
+   *
+   * @param input
+   * @param maxLevel Maximum level that can be used for truncation (e.g 
MONTH for Date input)
+   * @param truncFunc
+   * @tparam T
+   * @return
+   */
+  protected def evalHelper[T](input: InternalRow, maxLevel: Int)(
+truncFunc: (Any, Int) => T): Any = {
 val level = if (format.foldable) {
   truncLevel
 } else {
   DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 }
-if (level == -1) {
+if (level == DateTimeUtils.TRUNC_INVALID || level > maxLevel) {
   // unknown format
   null
 } else {
-  val d = date.eval(input)
+  val d = time.eval(input)
--- End diff --

nit: Since this is a time, it can be `val t = ...`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20016: SPARK-22830 Scala Coding style has been improved in Spar...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20016
  
Can one of the admins verify this patch?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20016: SPARK-22830 Scala Coding style has been improved ...

2017-12-18 Thread chetkhatri
GitHub user chetkhatri opened a pull request:

https://github.com/apache/spark/pull/20016

SPARK-22830 Scala Coding style has been improved in Spark Examples

## What changes were proposed in this pull request?

* Under Spark Scala Examples: Some of the syntax were written like Java 
way, It has been re-written as per scala style guide.
* Most of all changes are followed to println() statement.

## How was this patch tested?

Since, All changes proposed are re-writing println statements in scala way, 
manual run used to test println.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chetkhatri/spark scala-style-spark-examples

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20016.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20016


commit 4ac1cb1c2aa6f72eee339e8b8b647647e879d91f
Author: chetkhatri 
Date:   2017-12-19T07:17:37Z

SPARK-22830 Scala Coding style has been improved in Spark Examples




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157673626
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1295,87 +1295,184 @@ case class ParseToTimestamp(left: Expression, 
format: Option[Expression], child:
   override def dataType: DataType = TimestampType
 }
 
-/**
- * Returns date truncated to the unit specified by the format.
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "_FUNC_(date, fmt) - Returns `date` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.",
-  examples = """
-Examples:
-  > SELECT _FUNC_('2009-02-12', 'MM');
-   2009-02-01
-  > SELECT _FUNC_('2015-10-27', 'YEAR');
-   2015-01-01
-  """,
-  since = "1.5.0")
-// scalastyle:on line.size.limit
-case class TruncDate(date: Expression, format: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
-  override def left: Expression = date
-  override def right: Expression = format
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, 
StringType)
-  override def dataType: DataType = DateType
+trait TruncTime extends BinaryExpression with ImplicitCastInputTypes {
+  val time: Expression
+  val format: Expression
   override def nullable: Boolean = true
-  override def prettyName: String = "trunc"
 
   private lazy val truncLevel: Int =
 DateTimeUtils.parseTruncLevel(format.eval().asInstanceOf[UTF8String])
 
-  override def eval(input: InternalRow): Any = {
+  /**
+   *
+   * @param input
--- End diff --

Seems `input` and `truncFunc` descriptions missing.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157677311
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -,6 +,24 @@ def trunc(date, format):
 return Column(sc._jvm.functions.trunc(_to_java_column(date), format))
 
 
+@since(2.3)
+def date_trunc(format, timestamp):
+"""
+Returns timestamp truncated to the unit specified by the format.
+
+:param format: 'year', '', 'yy', 'month', 'mon', 'mm',
+'DAY', 'DD', 'HOUR', 'MINUTE', 'SECOND', 'WEEK', 'QUARTER'
--- End diff --

Could we make those lowercased too?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

2017-12-18 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request:

https://github.com/apache/spark/pull/19884#discussion_r157677769
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
 ---
@@ -86,21 +86,16 @@ private[sql] object ArrowConverters {
 val root = VectorSchemaRoot.create(arrowSchema, allocator)
 val arrowWriter = ArrowWriter.create(root)
 
-var closed = false
-
 context.addTaskCompletionListener { _ =>
-  if (!closed) {
-root.close()
-allocator.close()
-  }
+  root.close()
+  allocator.close()
--- End diff --

Yes thanks for the reminder, they are updated now also.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157677136
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2797,6 +2797,21 @@ object functions {
 TruncDate(date.expr, Literal(format))
   }
 
+  /**
+   * Returns timestamp truncated to the unit specified by the format.
+   *
+   * @param format: 'year', '', 'yy' for truncate by year,
+   *   'month', 'mon', 'mm' for truncate by month,
--- End diff --

nit: one space each more.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157677400
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -2797,6 +2797,21 @@ object functions {
 TruncDate(date.expr, Literal(format))
   }
 
+  /**
+   * Returns timestamp truncated to the unit specified by the format.
+   *
+   * @param format: 'year', '', 'yy' for truncate by year,
+   *   'month', 'mon', 'mm' for truncate by month,
+   *   'day', 'dd' for truncate by day,
+   *   Other options are: second, minute, hour, week, month, 
quarter
--- End diff --

Maybe, `'second', 'minute', 'hour', 'week', 'month' and 'quarter'`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157673559
  
--- Diff: python/pyspark/sql/functions.py ---
@@ -,6 +,24 @@ def trunc(date, format):
 return Column(sc._jvm.functions.trunc(_to_java_column(date), format))
 
 
+@since(2.3)
+def date_trunc(format, timestamp):
+"""
+Returns timestamp truncated to the unit specified by the format.
+
+:param format: 'year', '', 'yy', 'month', 'mon', 'mm',
+'DAY', 'DD', 'HOUR', 'MINUTE', 'SECOND', 'WEEK', 'QUARTER'
+
+>>> df = spark.createDataFrame([('1997-02-28',)], ['d'])
--- End diff --

Can we use a timestamp string like `1997-02-28 05:02:11` to show the 
difference from `trunc` a bit more clearly?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20015: [SPARK-22829] Add new built-in function date_trun...

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/20015#discussion_r157675835
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ---
@@ -1295,87 +1295,184 @@ case class ParseToTimestamp(left: Expression, 
format: Option[Expression], child:
   override def dataType: DataType = TimestampType
 }
 
-/**
- * Returns date truncated to the unit specified by the format.
- */
-// scalastyle:off line.size.limit
-@ExpressionDescription(
-  usage = "_FUNC_(date, fmt) - Returns `date` with the time portion of the 
day truncated to the unit specified by the format model `fmt`.",
-  examples = """
-Examples:
-  > SELECT _FUNC_('2009-02-12', 'MM');
-   2009-02-01
-  > SELECT _FUNC_('2015-10-27', 'YEAR');
-   2015-01-01
-  """,
-  since = "1.5.0")
-// scalastyle:on line.size.limit
-case class TruncDate(date: Expression, format: Expression)
-  extends BinaryExpression with ImplicitCastInputTypes {
-  override def left: Expression = date
-  override def right: Expression = format
-
-  override def inputTypes: Seq[AbstractDataType] = Seq(DateType, 
StringType)
-  override def dataType: DataType = DateType
+trait TruncTime extends BinaryExpression with ImplicitCastInputTypes {
--- End diff --

Maybe `TruncInstant`? I received this advice before and I liked it too. Not 
a big deal tho.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread ep1804
Github user ep1804 commented on the issue:

https://github.com/apache/spark/pull/20004
  
`CSVOptions` is changed NOT to set `charToEscapeQuoteEscaping` to `\u` 
by default, to allow the uniVocity parser to use `escape` as 
`charToEscapeQuoteEscaping` character.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19975
  
**[Test build #85098 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85098/testReport)**
 for PR 19975 at commit 
[`29094cc`](https://github.com/apache/spark/commit/29094cc79d0025989df34061832b908880c64323).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread ep1804
Github user ep1804 commented on the issue:

https://github.com/apache/spark/pull/20004
  
When `charToEscapeQuoteEscaping` is not set and `quote` and `escape` are 
different, uniVocity parser uses `escape` character as 
`charToEscapeQuoteEscaping` by default. This is why the test passes.

Ref: 
https://github.com/uniVocity/univocity-parsers/blob/v2.5.9/src/main/java/com/univocity/parsers/csv/CsvFormat.java#L149-L157

I changed the test case so that the text to be parsed uses a 
`charToEscapeQuoteEscaping` which is different from its `escape` character.

Though this case seems to be scarce, it will be why the uniVocity parser 
has this option exposed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19975: [SPARK-22781][SS] Support creating streaming dataset wit...

2017-12-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue:

https://github.com/apache/spark/pull/19975
  
Retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85089/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #85089 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85089/testReport)**
 for PR 19984 at commit 
[`359ebdd`](https://github.com/apache/spark/commit/359ebdd8bdac0b93aa6b88beab0212393f1e2577).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85097 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85097/testReport)**
 for PR 20004 at commit 
[`9680a4e`](https://github.com/apache/spark/commit/9680a4ed2613d2d7ef02afb52030cd90f75deb88).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20004
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85096/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85096 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85096/testReport)**
 for PR 20004 at commit 
[`50dc2c0`](https://github.com/apache/spark/commit/50dc2c0983540c3d25d2970dbd7c1cb6d973f5e6).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20004: [Spark-22818][SQL] csv escape of quote escape

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20004
  
**[Test build #85096 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85096/testReport)**
 for PR 20004 at commit 
[`50dc2c0`](https://github.com/apache/spark/commit/50dc2c0983540c3d25d2970dbd7c1cb6d973f5e6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused b...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19149
  
**[Test build #85095 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85095/testReport)**
 for PR 19149 at commit 
[`9b6fe36`](https://github.com/apache/spark/commit/9b6fe3644c3fc760de9ddf33c16bc1733e7c971d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict caused b...

2017-12-18 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/19149
  
retest this please


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/20014
  
Overall change lgtm.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryErr...

2017-12-18 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/20014#discussion_r157673852
  
--- Diff: 
core/src/main/java/org/apache/spark/memory/SparkOutOfMemoryError.java ---
@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.memory;
+
+/**
+ * This exception is thrown when a task can not acquire memory from the 
Memory manager.
+ * Instead of throwing {@link OutOfMemoryError}, which kills the executor,
+ * we should use throw this exception, which will just kill the current 
task.
+ */
+public final class SparkOutOfMemoryError extends OutOfMemoryError {
--- End diff --

is this an internal class? if yes perhaps we should label it.



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85084/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85084 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85084/testReport)**
 for PR 19950 at commit 
[`daba630`](https://github.com/apache/spark/commit/daba630343cb1d7f3ad137a75aaffb2a29b99cb6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class LabeledPointSuite extends SparkFunSuite `
  * `class TreePointSuite extends SparkFunSuite `


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85085/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19950
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85085 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85085/testReport)**
 for PR 19950 at commit 
[`6f51096`](https://github.com/apache/spark/commit/6f51096060645abeb0e9b9568bca6dbb6213b16a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20012: [SPARK-22824] Restore old offset for binary compatibilit...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20012
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85088/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20012: [SPARK-22824] Restore old offset for binary compatibilit...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20012
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20012: [SPARK-22824] Restore old offset for binary compatibilit...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20012
  
**[Test build #85088 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85088/testReport)**
 for PR 20012 at commit 
[`97fd2f9`](https://github.com/apache/spark/commit/97fd2f968d417bd7e690cf961efaac77bc2ef08d).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19746: [SPARK-22346][ML] VectorSizeHint Transformer for ...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19746#discussion_r157668450
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/feature/VectorSizeHint.scala ---
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.feature
+
+import org.apache.spark.SparkException
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.Transformer
+import org.apache.spark.ml.attribute.AttributeGroup
+import org.apache.spark.ml.linalg.{Vector, VectorUDT}
+import org.apache.spark.ml.param.{IntParam, Param, ParamMap, 
ParamValidators}
+import org.apache.spark.ml.param.shared.{HasHandleInvalid, HasInputCol}
+import org.apache.spark.ml.util.{DefaultParamsReadable, 
DefaultParamsWritable, Identifiable}
+import org.apache.spark.sql.{Column, DataFrame, Dataset}
+import org.apache.spark.sql.functions.{col, udf}
+import org.apache.spark.sql.types.StructType
+
+/**
+ * :: Experimental ::
+ * A feature transformer that adds size information to the metadata of a 
vector column.
+ * VectorAssembler needs size information for its input columns and cannot 
be used on streaming
+ * dataframes without this metadata.
+ *
+ */
+@Experimental
+@Since("2.3.0")
+class VectorSizeHint @Since("2.3.0") (@Since("2.3.0") override val uid: 
String)
+  extends Transformer with HasInputCol with HasHandleInvalid with 
DefaultParamsWritable {
+
+  @Since("2.3.0")
+  def this() = this(Identifiable.randomUID("vectSizeHint"))
+
+  /**
+   * The size of Vectors in `inputCol`.
+   * @group param
+   */
+  @Since("2.3.0")
+  val size: IntParam = new IntParam(
+this,
+"size",
+"Size of vectors in column.",
+{s: Int => s >= 0})
+
+  /** group getParam */
+  @Since("2.3.0")
+  def getSize: Int = getOrDefault(size)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setSize(value: Int): this.type = set(size, value)
+
+  /** @group setParam */
+  @Since("2.3.0")
+  def setInputCol(value: String): this.type = set(inputCol, value)
+
+  /**
+   * Param for how to handle invalid entries. Invalid vectors include 
nulls and vectors with the
+   * wrong size. The options are `skip` (filter out rows with invalid 
vectors), `error` (throw an
+   * error) and `optimistic` (do not check the vector size, and keep all 
row\). `error` by default.
--- End diff --

"row\" ==> "rows"


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19985#discussion_r157667492
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 ---
@@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest 
with SharedSQLContext {
   assert(df.queryExecution.simpleString.contains(replacement))
 }
   }
+
+  private def isIncluded(queryExecution: QueryExecution, msg: String): 
Boolean = {
+queryExecution.toString.contains(msg) ||
+queryExecution.simpleString.contains(msg) ||
+queryExecution.stringWithStats.contains(msg)
+  }
+
+  test("explain is redacted using SQLConf") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, 
"foo=1").toString)
+  val df = spark.read.parquet(basePath)
+  val replacement = "*"
+
+  // Respect SparkConf and replace file:/
+  assert(isIncluded(df.queryExecution, replacement))
+
+  assert(isIncluded(df.queryExecution, "FileScan"))
+  assert(!isIncluded(df.queryExecution, "file:/"))
+
+  withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> 
"(?i)FileScan") {
+// Respect SQLConf and replace FileScan
+assert(isIncluded(df.queryExecution, replacement))
+
+assert(!isIncluded(df.queryExecution, "FileScan"))
+assert(isIncluded(df.queryExecution, "file:/"))
+  }
+
+  // Respect SparkConf and replace file:/
+  assert(isIncluded(df.queryExecution, replacement))
+
+  assert(isIncluded(df.queryExecution, "FileScan"))
+  assert(!isIncluded(df.queryExecution, "file:/"))
--- End diff --

Removed


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20015
  
OK. I am fine if you all guys strongly feel about this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20015
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85083/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20015
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20015
  
**[Test build #85083 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85083/testReport)**
 for PR 20015 at commit 
[`f94f401`](https://github.com/apache/spark/commit/f94f401bcfd765b21c3fb466041b42a605d6a814).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20015
  
> after having a discussion with @cloud-fan, @gatorsmile, @rednaxelafx and 
Reynold

Where did the discussion happen? Was this offline discussion? I also want 
to actively join in the discussion. Many implementations of the trunc works 
differently and I think we decide the current behaviour after sufficient 
discussion.

If we don't fix the stuff about #14788 in 2.3.0 timeline, it could be even 
more difficult because we need to keep the previous behaviour.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19717: [SPARK-22646] [Submission] Spark on Kubernetes - ...

2017-12-18 Thread yangw1234
Github user yangw1234 commented on a diff in the pull request:

https://github.com/apache/spark/pull/19717#discussion_r157666054
  
--- Diff: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark-base/Dockerfile 
---
@@ -0,0 +1,47 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+FROM openjdk:8-alpine
--- End diff --

@foxish The libc6-compat did not fix our issue because libiomp5.so can not 
be loaded.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-18 Thread liyinan926
Github user liyinan926 commented on the issue:

https://github.com/apache/spark/pull/19954
  
@vanzin @mccheah @foxish Variables names have been shortened in 
https://github.com/apache/spark/pull/19954/commits/d50c61e46c43049b9ede615ffabd89a55c4adb0c,
 and traits with only a single implementation have been removed in 
https://github.com/apache/spark/pull/19954/commits/5b82fc06a5fff2a3ff2062c46a21d069640eeec7.
 Regarding the documentation of the concepts around steps and orchestrators and 
how they fit together, I would suggest that we have a separate PR for the 
architecture docs.  I also think we can refactor the orchestrator logic in a 
future PR to also cover executor pods so we use the same mechanism for both the 
driver and executor pods. WDYT?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19954: [SPARK-22757][Kubernetes] Enable use of remote dependenc...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19954
  
**[Test build #85094 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85094/testReport)**
 for PR 19954 at commit 
[`5b82fc0`](https://github.com/apache/spark/commit/5b82fc06a5fff2a3ff2062c46a21d069640eeec7).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85093 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85093/testReport)**
 for PR 19981 at commit 
[`5b64f88`](https://github.com/apache/spark/commit/5b64f881adcf34b958aa659e62a9ce93171cf109).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #85092 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85092/testReport)**
 for PR 19984 at commit 
[`19f08a9`](https://github.com/apache/spark/commit/19f08a9c875c6e52bb75c82d196fb3a310311ffe).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r157663383
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
 ---
@@ -147,236 +159,246 @@ class SQLListenerSuite extends SparkFunSuite with 
SharedSQLContext with JsonTest
   (id, accumulatorValue)
 }.toMap
 
-bus.postToAll(SparkListenerSQLExecutionStart(
+listener.onOtherEvent(SparkListenerSQLExecutionStart(
   executionId,
   "test",
   "test",
   df.queryExecution.toString,
   SparkPlanInfo.fromSparkPlan(df.queryExecution.executedPlan),
   System.currentTimeMillis()))
 
-bus.postToAll(SparkListenerJobStart(
+listener.onJobStart(SparkListenerJobStart(
   jobId = 0,
   time = System.currentTimeMillis(),
   stageInfos = Seq(
 createStageInfo(0, 0),
 createStageInfo(1, 0)
   ),
   createProperties(executionId)))
-bus.postToAll(SparkListenerStageSubmitted(createStageInfo(0, 0)))
+
listener.onStageSubmitted(SparkListenerStageSubmitted(createStageInfo(0, 0)))
 
-assert(store.executionMetrics(0).isEmpty)
+assert(statusStore.executionMetrics(executionId).isEmpty)
 
-bus.postToAll(SparkListenerExecutorMetricsUpdate("", Seq(
+
listener.onExecutorMetricsUpdate(SparkListenerExecutorMetricsUpdate("", Seq(
   // (task id, stage id, stage attempt, accum updates)
   (0L, 0, 0, createAccumulatorInfos(accumulatorUpdates)),
   (1L, 0, 0, createAccumulatorInfos(accumulatorUpdates))
 )))
 
-checkAnswer(store.executionMetrics(0), accumulatorUpdates.mapValues(_ 
* 2))
+checkAnswer(statusStore.executionMetrics(executionId), 
accumulatorUpdates.mapValues(_ * 2))
 
 // Driver accumulator updates don't belong to this execution should be 
filtered and no
 // exception will be thrown.
-bus.postToAll(SparkListenerDriverAccumUpdates(0, Seq((999L, 2L
+listener.onOtherEvent(SparkListenerDriverAccumUpdates(0, Seq((999L, 
2L
 
-checkAnswer(store.executionMetrics(0), accumulatorUpdates.mapValues(_ 
* 2))
+checkAnswer(statusStore.executionMetrics(executionId), 
accumulatorUpdates.mapValues(_ * 2))
 
-bus.postToAll(SparkListenerExecutorMetricsUpdate("", Seq(
+
listener.onExecutorMetricsUpdate(SparkListenerExecutorMetricsUpdate("", Seq(
   // (task id, stage id, stage attempt, accum updates)
   (0L, 0, 0, createAccumulatorInfos(accumulatorUpdates)),
   (1L, 0, 0, createAccumulatorInfos(accumulatorUpdates.mapValues(_ * 
2)))
 )))
 
-checkAnswer(store.executionMetrics(0), accumulatorUpdates.mapValues(_ 
* 3))
+checkAnswer(statusStore.executionMetrics(executionId), 
accumulatorUpdates.mapValues(_ * 3))
 
 // Retrying a stage should reset the metrics
-bus.postToAll(SparkListenerStageSubmitted(createStageInfo(0, 1)))
+
listener.onStageSubmitted(SparkListenerStageSubmitted(createStageInfo(0, 1)))
 
-bus.postToAll(SparkListenerExecutorMetricsUpdate("", Seq(
+
listener.onExecutorMetricsUpdate(SparkListenerExecutorMetricsUpdate("", Seq(
   // (task id, stage id, stage attempt, accum updates)
   (0L, 0, 1, createAccumulatorInfos(accumulatorUpdates)),
   (1L, 0, 1, createAccumulatorInfos(accumulatorUpdates))
 )))
 
-checkAnswer(store.executionMetrics(0), accumulatorUpdates.mapValues(_ 
* 2))
+checkAnswer(statusStore.executionMetrics(executionId), 
accumulatorUpdates.mapValues(_ * 2))
 
 // Ignore the task end for the first attempt
-bus.postToAll(SparkListenerTaskEnd(
+listener.onTaskEnd(SparkListenerTaskEnd(
   stageId = 0,
   stageAttemptId = 0,
   taskType = "",
   reason = null,
   createTaskInfo(0, 0, accums = accumulatorUpdates.mapValues(_ * 100)),
   null))
 
-checkAnswer(store.executionMetrics(0), accumulatorUpdates.mapValues(_ 
* 2))
+checkAnswer(statusStore.executionMetrics(executionId), 
accumulatorUpdates.mapValues(_ * 2))
 
 // Finish two tasks
-bus.postToAll(SparkListenerTaskEnd(
+listener.onTaskEnd(SparkListenerTaskEnd(
   stageId = 0,
   stageAttemptId = 1,
   taskType = "",
   reason = null,
   createTaskInfo(0, 0, accums = accumulatorUpdates.mapValues(_ * 2)),
   null))
-bus.postToAll(SparkListenerTaskEnd(
+listener.onTaskEnd(SparkListenerTaskEnd(
   stageId = 0,
   stageAttemptId = 1,
   taskType = "",
   reason = null,
   createTaskInfo(1, 0, accums = accumulatorUpdates.mapValues(_ * 3)),
   null))
 
-

[GitHub] spark issue #19981: [SPARK-22786][SQL] only use AppStatusPlugin in history s...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19981
  
**[Test build #85091 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85091/testReport)**
 for PR 19981 at commit 
[`60421ac`](https://github.com/apache/spark/commit/60421acf5c652731d64ffad013bd4c179d6401b5).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r157663315
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
 ---
@@ -36,11 +36,12 @@ import org.apache.spark.sql.catalyst.util.quietly
 import org.apache.spark.sql.execution.{LeafExecNode, QueryExecution, 
SparkPlanInfo, SQLExecution}
 import org.apache.spark.sql.execution.metric.{SQLMetric, SQLMetrics}
 import org.apache.spark.sql.test.SharedSQLContext
-import org.apache.spark.status.config._
+import org.apache.spark.status.config.LIVE_ENTITY_UPDATE_PERIOD
 import org.apache.spark.util.{AccumulatorMetadata, JsonProtocol, 
LongAccumulator}
 import org.apache.spark.util.kvstore.InMemoryStore
 
-class SQLListenerSuite extends SparkFunSuite with SharedSQLContext with 
JsonTestUtils {
+
+class SQLAppStatusListenerSuite extends SparkFunSuite with 
SharedSQLContext with JsonTestUtils {
--- End diff --

Then it's an existing problem of `SQLListenerSuite`, as previously it 
didn't only test `SQLListener`. This should not stop us from renaming it after 
we rename `SQLListener`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19985
  
**[Test build #85090 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85090/testReport)**
 for PR 19985 at commit 
[`75c1479`](https://github.com/apache/spark/commit/75c1479d62045f3b8f987d38d88fd2e6be88e57b).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19981: [SPARK-22786][SQL] only use AppStatusPlugin in hi...

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19981#discussion_r157662992
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala ---
@@ -322,15 +321,18 @@ private[history] class FsHistoryProvider(conf: 
SparkConf, clock: Clock)
 (new InMemoryStore(), true)
 }
 
+val plugins = ServiceLoader.load(
+  classOf[AppHistoryServerPlugin], 
Utils.getContextOrSparkClassLoader).asScala
 val trackingStore = new ElementTrackingStore(kvstore, conf)
--- End diff --

do we really need to limit the UI data for history server? cc @vanzin 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread youngbink
Github user youngbink commented on the issue:

https://github.com/apache/spark/pull/20015
  
@HyukjinKwon  Just took a look at this PR #14788. 

My point of mentioning those databases was just to give examples of the 
function that Spark doesn't support but other databases commonly do. (They all 
have this `date_trunc` which takes `timestamp` and output `timestamp`)
As you said, we could extend `trunc` and simply create an alias 
`date_trunc`, but it's actually not as simple. For e.g, PR #14788 won't be able 
to handle the following command collectly on PySpark:
```
df = spark.createDataFrame([('1997-02-28 05:02:11',)], ['d'])
df.select(functions.trunc(df.d, 'year').alias('year')).collect()  
df.select(functions.trunc(df.d, 'SS').alias('SS')).collect() 
```
This is because `trunc(string, string)` isn't correctly handled. We could 
find a way around this and get it working, but after having a discussion with 
@cloud-fan, @gatorsmile, @rednaxelafx and Reynold, we decided to add 
`date_trunc` to be compatible with Postgres for now. 







---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19149: [SPARK-21652][SQL][FOLLOW-UP] Fix rule conflict c...

2017-12-18 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19149#discussion_r157661943
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -47,7 +47,62 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
   protected def fixedPoint = FixedPoint(SQLConf.get.optimizerMaxIterations)
 
   def batches: Seq[Batch] = {
-Batch("Eliminate Distinct", Once, EliminateDistinct) ::
+val operatorOptimizationRuleSet =
+  Seq(
+// Operator push down
+PushProjectionThroughUnion,
+ReorderJoin,
+EliminateOuterJoin,
+PushPredicateThroughJoin,
+PushDownPredicate,
+LimitPushDown,
+ColumnPruning,
+InferFiltersFromConstraints,
--- End diff --

Yeah


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20014
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/20014
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85081/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20014: [SPARK-22827][CORE] Avoid throwing OutOfMemoryError in c...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20014
  
**[Test build #85081 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85081/testReport)**
 for PR 20014 at commit 
[`4254d3e`](https://github.com/apache/spark/commit/4254d3e35bf6d8c34435c7d7456d18e4b86a7e59).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19904
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85086/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19904
  
**[Test build #85086 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85086/testReport)**
 for PR 19904 at commit 
[`ccd2689`](https://github.com/apache/spark/commit/ccd26895e749cce1acbe65a5507cba6b0e6c42d3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20000: [SPARK-22815] [SQL] Keep PromotePrecision in Optimized P...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85082/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20000: [SPARK-22815] [SQL] Keep PromotePrecision in Optimized P...

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/2
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20000: [SPARK-22815] [SQL] Keep PromotePrecision in Optimized P...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/2
  
**[Test build #85082 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85082/testReport)**
 for PR 2 at commit 
[`3f27c4b`](https://github.com/apache/spark/commit/3f27c4b839a352716c204d1580f17d6625f7bee4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #85089 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85089/testReport)**
 for PR 19984 at commit 
[`359ebdd`](https://github.com/apache/spark/commit/359ebdd8bdac0b93aa6b88beab0212393f1e2577).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19892
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85087/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19892
  
**[Test build #85087 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85087/testReport)**
 for PR 19892 at commit 
[`e1fb379`](https://github.com/apache/spark/commit/e1fb379ba710a7656d73f60c45c18dbf0761a98c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-18 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/19988
  
Hm that's a good point. It means the optimization isn't sure it's making 
progress anymore because the input is too weird but I guess that doesn't say 
much about the actual quality of the solution


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20012: [SPARK-22824] Restore old offset for binary compatibilit...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/20012
  
**[Test build #85088 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85088/testReport)**
 for PR 20012 at commit 
[`97fd2f9`](https://github.com/apache/spark/commit/97fd2f968d417bd7e690cf961efaac77bc2ef08d).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19988: [Spark-22795] [ML] Raise error when line search in First...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19988
  
@srowen Wait... @jkbradley seems to have more thoughts about this:
Question:
When line search failed, does it mean the model is always meaning-less ?
Maybe we need more discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19892: [SPARK-22797][PySpark] Bucketizer support multi-column

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19892
  
**[Test build #85087 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85087/testReport)**
 for PR 19892 at commit 
[`e1fb379`](https://github.com/apache/spark/commit/e1fb379ba710a7656d73f60c45c18dbf0761a98c).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19984
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85079/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19984: [SPARK-22789] Map-only continuous processing execution

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19984
  
**[Test build #85079 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85079/testReport)**
 for PR 19984 at commit 
[`2af9b40`](https://github.com/apache/spark/commit/2af9b40d60027397fd6e915413b88f12249245e5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19904: [SPARK-22707][ML] Optimize CrossValidator memory occupat...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19904
  
**[Test build #85086 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85086/testReport)**
 for PR 19904 at commit 
[`ccd2689`](https://github.com/apache/spark/commit/ccd26895e749cce1acbe65a5507cba6b0e6c42d3).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-18 Thread CodingCat
Github user CodingCat commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r157653784
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -479,4 +485,43 @@ class InMemoryColumnarQuerySuite extends QueryTest 
with SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-22673: InMemoryRelation should utilize existing stats of the 
plan to be cached") {
+withSQLConf("spark.sql.cbo.enabled" -> "true") {
+  withTempPath { workDir =>
+withTable("table1") {
+  val workDirPath = workDir.getAbsolutePath
+  val data = Seq(100, 200, 300, 400).toDF("count")
+  data.write.parquet(workDirPath)
+  val dfFromFile = spark.read.parquet(workDirPath).cache()
+  val inMemoryRelation = 
dfFromFile.queryExecution.optimizedPlan.collect {
+case plan: InMemoryRelation => plan
+  }.head
+  // InMemoryRelation's stats is file size before the underlying 
RDD is materialized
+  assert(inMemoryRelation.computeStats().sizeInBytes === 740)
+
+  // InMemoryRelation's stats is updated after materializing RDD
+  dfFromFile.collect()
+  assert(inMemoryRelation.computeStats().sizeInBytes === 16)
+
+  // test of catalog table
+  val dfFromTable = spark.catalog.createTable("table1", 
workDirPath).cache()
+  val inMemoryRelation2 = dfFromTable.queryExecution.optimizedPlan.
+collect { case plan: InMemoryRelation => plan }.head
+
+  // Even CBO enabled, InMemoryRelation's stats keeps as the file 
size before table's stats
+  // is calculated
+  assert(inMemoryRelation2.computeStats().sizeInBytes === 740)
+
+  // InMemoryRelation's stats should be updated after calculating 
stats of the table
+  // clear cache to simulate a fresh environment
+  dfFromTable.unpersist(blocking = true)
+  spark.sql("ANALYZE TABLE table1 COMPUTE STATISTICS")
+  val inMemoryRelation3 = 
spark.read.table("table1").cache().queryExecution.optimizedPlan.
+collect { case plan: InMemoryRelation => plan }.head
+  assert(inMemoryRelation3.computeStats().sizeInBytes === 48)
--- End diff --

because 16 is the `exact` in-memory size which is got by reading the 
accumulator's value after evaluating the RDD

48 is calculated by EstimationUtils: 
https://github.com/apache/spark/blob/bdb5e55c2a67d16a36ad6baa22296d714d3525af/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/EstimationUtils.scala#L78
 `(8 + 4 (average attribute length)) * 4`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19950: [SPARK-22450][Core][MLLib][FollowUp] safely regis...

2017-12-18 Thread zhengruifeng
Github user zhengruifeng commented on a diff in the pull request:

https://github.com/apache/spark/pull/19950#discussion_r157653695
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tree/impl/TreePointSuite.scala ---
@@ -0,0 +1,39 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.tree.impl
+
+import org.apache.spark.{SparkConf, SparkFunSuite}
+import org.apache.spark.serializer.KryoSerializer
+
+class TreePointSuite extends SparkFunSuite{
+  test("Kryo class register") {
+val conf = new SparkConf(false)
+conf.set("spark.kryo.registrationRequired", "true")
+
+val ser = new KryoSerializer(conf).newInstance()
+
+def check(p: TreePoint): Unit = {
--- End diff --

I also think there is not much value to do this, although current 
testsuites are all like this.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19350: [SPARK-22126][ML][WIP] Fix model-specific optimization s...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19350
  
Design changed. I will create new PR for this later. New design is here 
https://docs.google.com/document/d/1xw5M4sp1e0eQie75yIt-r6-GTuD5vpFf_I6v-AFBM3M/edit?usp=sharing



---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19350: [SPARK-22126][ML][WIP] Fix model-specific optimiz...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/19350


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19857: [SPARK-22667][ML][WIP] Fix model-specific optimization s...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/19857
  
The design of this issue changed. @MrBago will take this over.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19857: [SPARK-22667][ML][WIP] Fix model-specific optimiz...

2017-12-18 Thread WeichenXu123
Github user WeichenXu123 closed the pull request at:

https://github.com/apache/spark/pull/19857


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85085 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85085/testReport)**
 for PR 19950 at commit 
[`6f51096`](https://github.com/apache/spark/commit/6f51096060645abeb0e9b9568bca6dbb6213b16a).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #20015: [SPARK-22829] Add new built-in function date_trunc()

2017-12-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/20015
  
@cloud-fan and @youngbink how about reviving 
https://github.com/apache/spark/pull/14788 with a configuration to control this?

AWS Redshift seems having `TRUNC` which just converts a timestamp to a date 
whereas we have Spark's `trunc` where supports date formats. This is not quite 
equivalent. I think Spark's `trunc` is more like Redshift's `DATE_TRUNC`.

PostgreSQL does not have `trunc` but has `date_trunc` where we can specify 
the format and returns a timestamp always.

Presto also looks not having a duplicated functionality.

I think we can simply introduce an alias for `trunc` after resolving 
https://github.com/apache/spark/pull/14788.

Did I maybe miss something?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19950: [SPARK-22450][Core][MLLib][FollowUp] safely register cla...

2017-12-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/19950
  
**[Test build #85084 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85084/testReport)**
 for PR 19950 at commit 
[`daba630`](https://github.com/apache/spark/commit/daba630343cb1d7f3ad137a75aaffb2a29b99cb6).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19933: [SPARK-22744][CORE] Add a configuration to show the appl...

2017-12-18 Thread LantaoJin
Github user LantaoJin commented on the issue:

https://github.com/apache/spark/pull/19933
  
Thanks your time to discuss so much with me @srowen. Like MapReduce does, 
the submit host should be set automatically in submitting logical rather than 
expecting Spark application owner to set. Almost application owners don't care 
about it so they won't set it in configuration. But if it can be found in 
WebUI, it would be very helpful to platform/infra team. So the code I 
contributed
`sparkConf.set("spark.submit.hostname", Utils.localHostName)` in 
SparkSubmit.scala has no side effect to application onwers.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19864: [SPARK-22673][SQL] InMemoryRelation should utiliz...

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19864#discussion_r157652482
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/columnar/InMemoryColumnarQuerySuite.scala
 ---
@@ -479,4 +485,43 @@ class InMemoryColumnarQuerySuite extends QueryTest 
with SharedSQLContext {
   }
 }
   }
+
+  test("SPARK-22673: InMemoryRelation should utilize existing stats of the 
plan to be cached") {
+withSQLConf("spark.sql.cbo.enabled" -> "true") {
+  withTempPath { workDir =>
+withTable("table1") {
+  val workDirPath = workDir.getAbsolutePath
+  val data = Seq(100, 200, 300, 400).toDF("count")
+  data.write.parquet(workDirPath)
+  val dfFromFile = spark.read.parquet(workDirPath).cache()
+  val inMemoryRelation = 
dfFromFile.queryExecution.optimizedPlan.collect {
+case plan: InMemoryRelation => plan
+  }.head
+  // InMemoryRelation's stats is file size before the underlying 
RDD is materialized
+  assert(inMemoryRelation.computeStats().sizeInBytes === 740)
+
+  // InMemoryRelation's stats is updated after materializing RDD
+  dfFromFile.collect()
+  assert(inMemoryRelation.computeStats().sizeInBytes === 16)
+
+  // test of catalog table
+  val dfFromTable = spark.catalog.createTable("table1", 
workDirPath).cache()
+  val inMemoryRelation2 = dfFromTable.queryExecution.optimizedPlan.
+collect { case plan: InMemoryRelation => plan }.head
+
+  // Even CBO enabled, InMemoryRelation's stats keeps as the file 
size before table's stats
+  // is calculated
+  assert(inMemoryRelation2.computeStats().sizeInBytes === 740)
+
+  // InMemoryRelation's stats should be updated after calculating 
stats of the table
+  // clear cache to simulate a fresh environment
+  dfFromTable.unpersist(blocking = true)
+  spark.sql("ANALYZE TABLE table1 COMPUTE STATISTICS")
+  val inMemoryRelation3 = 
spark.read.table("table1").cache().queryExecution.optimizedPlan.
+collect { case plan: InMemoryRelation => plan }.head
+  assert(inMemoryRelation3.computeStats().sizeInBytes === 48)
--- End diff --

missed this one, why does it have a different stats than the table cache 
stats `16`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19985#discussion_r157652242
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/DataSourceScanExecRedactionSuite.scala
 ---
@@ -52,4 +53,40 @@ class DataSourceScanExecRedactionSuite extends QueryTest 
with SharedSQLContext {
   assert(df.queryExecution.simpleString.contains(replacement))
 }
   }
+
+  private def isIncluded(queryExecution: QueryExecution, msg: String): 
Boolean = {
+queryExecution.toString.contains(msg) ||
+queryExecution.simpleString.contains(msg) ||
+queryExecution.stringWithStats.contains(msg)
+  }
+
+  test("explain is redacted using SQLConf") {
+withTempDir { dir =>
+  val basePath = dir.getCanonicalPath
+  spark.range(0, 10).toDF("a").write.parquet(new Path(basePath, 
"foo=1").toString)
+  val df = spark.read.parquet(basePath)
+  val replacement = "*"
+
+  // Respect SparkConf and replace file:/
+  assert(isIncluded(df.queryExecution, replacement))
+
+  assert(isIncluded(df.queryExecution, "FileScan"))
+  assert(!isIncluded(df.queryExecution, "file:/"))
+
+  withSQLConf(SQLConf.SQL_STRING_REDACTION_PATTERN.key -> 
"(?i)FileScan") {
+// Respect SQLConf and replace FileScan
+assert(isIncluded(df.queryExecution, replacement))
+
+assert(!isIncluded(df.queryExecution, "FileScan"))
+assert(isIncluded(df.queryExecution, "file:/"))
+  }
+
+  // Respect SparkConf and replace file:/
+  assert(isIncluded(df.queryExecution, replacement))
+
+  assert(isIncluded(df.queryExecution, "FileScan"))
+  assert(!isIncluded(df.queryExecution, "file:/"))
--- End diff --

what the difference between these 3 lines and 
https://github.com/apache/spark/pull/19985/files#diff-0c515221ed6e6eadcec71b3b9ad3a3e1R70
 ?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/19985#discussion_r157652152
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/QueryExecution.scala ---
@@ -231,6 +231,13 @@ class QueryExecution(val sparkSession: SparkSession, 
val logical: LogicalPlan) {
 """.stripMargin.trim
   }
 
+  /**
+   * Redact the sensitive information in the given string.
+   */
+  private def withRedaction(message: => String): String = {
--- End diff --

`=> String` looks not very useful here, we need to materialize anyway when 
calling `Utils.redact`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19985
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85077/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #19985: [SPARK-22791] [SQL] [SS] Redact Output of Explain

2017-12-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/19985
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20008: [SPARK-22822][TEST] Basic tests for FunctionArgum...

2017-12-18 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/20008#discussion_r157651436
  
--- Diff: 
sql/core/src/test/resources/sql-tests/inputs/typeCoercion/native/decimalPrecision.sql
 ---
@@ -0,0 +1,6883 @@
+--
+--   Licensed to the Apache Software Foundation (ASF) under one or more
+--   contributor license agreements.  See the NOTICE file distributed with
+--   this work for additional information regarding copyright ownership.
+--   The ASF licenses this file to You under the Apache License, Version 
2.0
+--   (the "License"); you may not use this file except in compliance with
+--   the License.  You may obtain a copy of the License at
+--
+--  http://www.apache.org/licenses/LICENSE-2.0
+--
+--   Unless required by applicable law or agreed to in writing, software
+--   distributed under the License is distributed on an "AS IS" BASIS,
+--   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+--   See the License for the specific language governing permissions and
+--   limitations under the License.
+--
+
+CREATE TEMPORARY VIEW t AS SELECT 1;
+
+SELECT cast(1 as tinyint) + cast(1 as decimal(1, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(3, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(4, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(5, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(6, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(10, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(11, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(20, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(21, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(38, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(39, 0)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(1, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(2, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(3, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(4, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(5, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(6, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(10, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(11, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(20, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(21, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(38, 1)) FROM t;
+SELECT cast(1 as tinyint) + cast(1 as decimal(39, 1)) FROM t;
--- End diff --

How about only these 4 decimals: `DECIMAL(3, 0)`, `DECIMAL(5, 0)`, 
`DECIMAL(10, 0)` and `DECIMAL(20, 0)`.

https://github.com/apache/spark/blob/00d176d2fe7bbdf55cb3146a9cb04ca99b1858b7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L54-L57


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >