[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

GitBox Tue, 26 May 2020 20:42:25 -0700


fqaiser94 commented on a change in pull request #27066:
URL: https://github.com/apache/spark/pull/27066#discussion_r430714578




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##########
@@ -515,3 +515,96 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces field in struct by name.
+ *
+ * @param children Seq(struct, name, val)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name, val) - Adds/replaces field in struct by name.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+       {"a":1,"b":2,"c":3}
+      > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+       {"a":1,"b":3}
+      > SELECT _FUNC_(CAST(NULL AS struct<a:int,b:int>), "c", 3);
+       {"a":null,"b":null,"c":3}

Review comment:
       > Most SQL functions return null if the input is null. 
   
   Agreed that this is true in the majority of cases.
   
   > Do we have a valid real-world use case to justify this behavior?
   
   The justification for this behaviour is not based on any use case. 
   Originally, the code in this PR did exactly what you're saying i.e. null 
input returns null output. 
   However, based on some feedback on the PR, I switched to the current 
behaviour for both simplicity of implementation and easy optimization of result 
plans. Please refer to the discussion 
[here](https://github.com/apache/spark/pull/27066#issuecomment-609512144) and 
[here](https://github.com/apache/spark/pull/27066#issuecomment-609671222) for 
more details. 
   IMO this is an acceptable compromise. 
   The user always has the option of the wrapping this method in a case-when 
expression themselves. 
   
   

##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/ColumnExpressionSuite.scala
##########
@@ -923,4 +923,452 @@ class ColumnExpressionSuite extends QueryTest with 
SharedSparkSession {
     val inSet = InSet(Literal("a"), Set("a", "b").map(UTF8String.fromString))
     assert(inSet.sql === "('a' IN ('a', 'b'))")
   }
+
+  {

Review comment:
       No, it's not necessary. I was doing it because I did not see any reason 
for the withField specific data and methods to be accessible outside of the 
scope. 

##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala
##########
@@ -515,3 +515,96 @@ case class StringToMap(text: Expression, pairDelim: 
Expression, keyValueDelim: E
 
   override def prettyName: String = "str_to_map"
 }
+
+/**
+ * Adds/replaces field in struct by name.
+ *
+ * @param children Seq(struct, name, val)
+ */
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(struct, name, val) - Adds/replaces field in struct by name.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "c", 3);
+       {"a":1,"b":2,"c":3}
+      > SELECT _FUNC_(NAMED_STRUCT("a", 1, "b", 2), "b", 3);
+       {"a":1,"b":3}
+      > SELECT _FUNC_(CAST(NULL AS struct<a:int,b:int>), "c", 3);
+       {"a":null,"b":null,"c":3}
+      > SELECT _FUNC_(NAMED_STRUCT('a', 1, 'b', 2, 'b', 3), 'b', 100);
+       {"a":1,"b":100,"b":100}
+      > SELECT _FUNC_(a, 'a.c', 3) FROM (VALUES (NAMED_STRUCT('a', 
NAMED_STRUCT('a', 1, 'b', 2))) AS T(a));
+       {"a":{"a":1,"b":2,"c":3}}
+  """)
+// scalastyle:on line.size.limit
+case class WithField(children: Seq[Expression]) extends Unevaluable {
+  private lazy val Seq(structExpr, nestedAttributeExpr, valExpr) = children
+  private lazy val nestedAttribute = nestedAttributeExpr.eval().toString
+  private lazy val attributes = 
UnresolvedAttribute.parseAttributeName(nestedAttribute)
+  lazy val toCreateNamedStruct: Expression = toCreateNamedStruct(structExpr, 
attributes)
+
+  override def dataType: StructType = 
toCreateNamedStruct.dataType.asInstanceOf[StructType]
+
+  override def foldable: Boolean = toCreateNamedStruct.foldable
+
+  override def nullable: Boolean = toCreateNamedStruct.nullable
+
+  override def checkInputDataTypes(): TypeCheckResult = {
+    val structTypeName = StructType(Nil).typeName
+    if (children.size != 3) {
+      TypeCheckResult.TypeCheckFailure(s"$prettyName expects 3 arguments.")
+    } else if (!structExpr.dataType.isInstanceOf[StructType]) {
+      TypeCheckResult.TypeCheckFailure(s"Only $structTypeName is allowed to 
appear at " +
+        s"first position, got: ${structExpr.dataType.typeName}.")
+    } else if (!(nestedAttributeExpr.foldable && nestedAttributeExpr.dataType 
== StringType)) {
+      TypeCheckResult.TypeCheckFailure(s"Only foldable 
${StringType.catalogString} expressions " +
+        s"are allowed to appear at second position.")
+    } else if (nestedAttribute == null) {
+      TypeCheckResult.TypeCheckFailure("Field name should not be null.")
+    } else {
+      val structType = structExpr.dataType.asInstanceOf[StructType]
+      val intermediateAttributes :+ _ = attributes
+      intermediateAttributes

Review comment:
       I use `StructType.findNestedField` 2 lines below this one. Not sure how 
I would reuse `StructType.findNestedField` for this line though. Could you be 
more specific? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] fqaiser94 commented on a change in pull request #27066: [SPARK-31317][SQL] Add withField method to Column

Reply via email to