[GitHub] [spark] maropu commented on a change in pull request #28979: [SPARK-32154][SQL] Use ExpressionEncoder for the return type of ScalaUDF to convert to catalyst type

GitBox Sat, 04 Jul 2020 04:05:06 -0700


maropu commented on a change in pull request #28979:
URL: https://github.com/apache/spark/pull/28979#discussion_r449759889




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -2819,13 +2819,12 @@ class Analyzer(
 
       case p => p transformExpressionsUp {
 
-        case udf @ ScalaUDF(_, _, inputs, _, _, _, _)
-            if udf.inputPrimitives.contains(true) =>
+        case udf: ScalaUDF if udf.inputPrimitives.contains(true) =>

Review comment:
       (This is not related to this PR though) To avoid recomputation, could we 
use `val` for `inputPrimitives` instead of `def`?
   
https://github.com/apache/spark/blob/42f01e314b4874236544cc8b94bef766269385ee/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala#L65

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
##########
@@ -36,6 +36,8 @@ import org.apache.spark.sql.types.{AbstractDataType, 
AnyDataType, DataType, User
  * @param inputEncoders ExpressionEncoder for each input parameters. For a 
input parameter which
  *                      serialized as struct will use encoder instead of 
CatalystTypeConverters to
  *                      convert internal value to Scala value.
+ * @param returnEncoder ExpressionEncoder for the return type of function. 
It's only defined when

Review comment:
       How about `returnEncoder` -> `outputEncoder`?

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
##########
@@ -102,6 +105,28 @@ case class ScalaUDF(
     }
   }
 
+  /**
+   * Create the converter which converts the scala data type to the catalyst 
data type for
+   * the return data type of udf function. We'd use `ExpressionEncoder` to 
create the
+   * converter for typed ScalaUDF only, since its the only case where we know 
the  type tag

Review comment:
       nit: found a single unnecessary space `the  type tag`

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
##########
@@ -305,6 +305,14 @@ case class ExpressionEncoder[T](
     StructField(s.name, s.dataType, s.nullable)
   })
 
+  def dataTypeAndNullable(): (DataType, Boolean) = {

Review comment:
       drop `()`? https://github.com/databricks/scala-style-guide#parentheses

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/ScalaUDF.scala
##########
@@ -102,6 +105,28 @@ case class ScalaUDF(
     }
   }
 
+  /**
+   * Create the converter which converts the scala data type to the catalyst 
data type for
+   * the return data type of udf function. We'd use `ExpressionEncoder` to 
create the
+   * converter for typed ScalaUDF only, since its the only case where we know 
the  type tag
+   * of the return data type of udf function.
+   * @param dataType return type of function
+   * @return the catalyst converter
+   */

Review comment:
       How about `catalystConverter ` -> `createToCatalystConverter`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] maropu commented on a change in pull request #28979: [SPARK-32154][SQL] Use ExpressionEncoder for the return type of ScalaUDF to convert to catalyst type

Reply via email to