[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240234573 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,25 +47,13 @@ case class ScalaUDF( function: AnyRef, dataType: DataType, children: Seq[Expression], -inputsNullSafe: Seq[Boolean], -inputTypes: Seq[DataType] = Nil, +@transient inputsNullSafe: Seq[Boolean], +@transient inputTypes: Seq[AbstractDataType] = Nil, udfName: Option[String] = None, nullable: Boolean = true, udfDeterministic: Boolean = true) extends Expression with ImplicitCastInputTypes with NonSQLExpression with UserDefinedExpression { - // The constructor for SPARK 2.1 and 2.2 --- End diff -- I'm OK removing it even without a formal deprecation; these versions are EOL --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240234371 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,25 +47,13 @@ case class ScalaUDF( function: AnyRef, dataType: DataType, children: Seq[Expression], -inputsNullSafe: Seq[Boolean], -inputTypes: Seq[DataType] = Nil, +@transient inputsNullSafe: Seq[Boolean], --- End diff -- What was the need for this one? does this object get caught in a closure somewhere? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240234583 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/expressions/UserDefinedFunction.scala --- @@ -88,68 +88,49 @@ sealed trait UserDefinedFunction { private[sql] case class SparkUserDefinedFunction( f: AnyRef, dataType: DataType, -inputTypes: Option[Seq[DataType]], -nullableTypes: Option[Seq[Boolean]], +inputSchemas: Seq[Option[ScalaReflection.Schema]], name: Option[String] = None, nullable: Boolean = true, deterministic: Boolean = true) extends UserDefinedFunction { @scala.annotation.varargs - override def apply(exprs: Column*): Column = { -// TODO: make sure this class is only instantiated through `SparkUserDefinedFunction.create()` -// and `nullableTypes` is always set. -if (inputTypes.isDefined) { - assert(inputTypes.get.length == nullableTypes.get.length) -} - -val inputsNullSafe = nullableTypes.getOrElse { - ScalaReflection.getParameterTypeNullability(f) --- End diff -- Not worth to keep it anymore, as Scala 2.12 is the default now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240231883 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -4255,11 +4255,11 @@ object functions { * * @group udf_funcs * @since 2.0.0 + * + * @deprecated("please use the typed `udf` methods", "3.0.0") --- End diff -- with Scala 2.12, type and nullability info need to be retrieved during compile time, this method is not very useful and we should deprecate it. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/23275#discussion_r240230970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala --- @@ -47,25 +47,13 @@ case class ScalaUDF( function: AnyRef, dataType: DataType, children: Seq[Expression], -inputsNullSafe: Seq[Boolean], -inputTypes: Seq[DataType] = Nil, +@transient inputsNullSafe: Seq[Boolean], +@transient inputTypes: Seq[AbstractDataType] = Nil, udfName: Option[String] = None, nullable: Boolean = true, udfDeterministic: Boolean = true) extends Expression with ImplicitCastInputTypes with NonSQLExpression with UserDefinedExpression { - // The constructor for SPARK 2.1 and 2.2 --- End diff -- not useful anymore in Spark 3.0. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23275: [SPARK-26323][SQL] Scala UDF should still check i...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/23275 [SPARK-26323][SQL] Scala UDF should still check input types even if some inputs are of type Any ## What changes were proposed in this pull request? For Scala UDF, when checking input nullability, we will skip inputs with type `Any`, and only check the inputs that provide nullability info. We should do the same for checking input types. ## How was this patch tested? new tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark udf Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23275.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23275 commit 8582607195f12a4c133fb28b59e8a7fce7a97fbb Author: Wenchen Fan Date: 2018-12-10T13:00:17Z Scala UDF should still check input types even if some inputs are of type Any --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org