[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23275#discussion_r240234573
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -47,25 +47,13 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputsNullSafe: Seq[Boolean],
-inputTypes: Seq[DataType] = Nil,
+@transient inputsNullSafe: Seq[Boolean],
+@transient inputTypes: Seq[AbstractDataType] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
 udfDeterministic: Boolean = true)
   extends Expression with ImplicitCastInputTypes with NonSQLExpression 
with UserDefinedExpression {
 
-  // The constructor for SPARK 2.1 and 2.2
--- End diff --

I'm OK removing it even without a formal deprecation; these versions are EOL


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/23275#discussion_r240234371
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -47,25 +47,13 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputsNullSafe: Seq[Boolean],
-inputTypes: Seq[DataType] = Nil,
+@transient inputsNullSafe: Seq[Boolean],
--- End diff --

What was the need for this one? does this object get caught in a closure 
somewhere?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23275#discussion_r240234583
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala
 ---
@@ -88,68 +88,49 @@ sealed trait UserDefinedFunction {
 private[sql] case class SparkUserDefinedFunction(
 f: AnyRef,
 dataType: DataType,
-inputTypes: Option[Seq[DataType]],
-nullableTypes: Option[Seq[Boolean]],
+inputSchemas: Seq[Option[ScalaReflection.Schema]],
 name: Option[String] = None,
 nullable: Boolean = true,
 deterministic: Boolean = true) extends UserDefinedFunction {
 
   @scala.annotation.varargs
-  override def apply(exprs: Column*): Column = {
-// TODO: make sure this class is only instantiated through 
`SparkUserDefinedFunction.create()`
-// and `nullableTypes` is always set.
-if (inputTypes.isDefined) {
-  assert(inputTypes.get.length == nullableTypes.get.length)
-}
-
-val inputsNullSafe = nullableTypes.getOrElse {
-  ScalaReflection.getParameterTypeNullability(f)
--- End diff --

Not worth to keep it anymore, as Scala 2.12 is the default now.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23275#discussion_r240231883
  
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala ---
@@ -4255,11 +4255,11 @@ object functions {
*
* @group udf_funcs
* @since 2.0.0
+   *
+   * @deprecated("please use the typed `udf` methods", "3.0.0")
--- End diff --

with Scala 2.12, type and nullability info need to be retrieved during 
compile time, this method is not very useful and we should deprecate it.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/23275#discussion_r240230970
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
 ---
@@ -47,25 +47,13 @@ case class ScalaUDF(
 function: AnyRef,
 dataType: DataType,
 children: Seq[Expression],
-inputsNullSafe: Seq[Boolean],
-inputTypes: Seq[DataType] = Nil,
+@transient inputsNullSafe: Seq[Boolean],
+@transient inputTypes: Seq[AbstractDataType] = Nil,
 udfName: Option[String] = None,
 nullable: Boolean = true,
 udfDeterministic: Boolean = true)
   extends Expression with ImplicitCastInputTypes with NonSQLExpression 
with UserDefinedExpression {
 
-  // The constructor for SPARK 2.1 and 2.2
--- End diff --

not useful anymore in Spark 3.0.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...

2018-12-10 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/23275

[SPARK-26323][SQL] Scala UDF should still check input types even if some 
inputs are of type Any

## What changes were proposed in this pull request?

For Scala UDF, when checking input nullability, we will skip inputs with 
type `Any`, and only check the inputs that provide nullability info.

We should do the same for checking input types.

## How was this patch tested?

new tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark udf

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23275


commit 8582607195f12a4c133fb28b59e8a7fce7a97fbb
Author: Wenchen Fan 
Date:   2018-12-10T13:00:17Z

Scala UDF should still check input types even if some inputs are of type Any




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org