szehon-ho commented on code in PR #53360:
URL: https://github.com/apache/spark/pull/53360#discussion_r2621155965
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala:
##########
@@ -223,6 +301,77 @@ object AssignmentUtils extends SQLConfHelper with
CastSupport {
CreateNamedStruct(namedStructExprs)
}
+ /**
+ * Checks if target struct has extra fields compared to source struct,
recursively.
+ */
+ private def hasExtraTargetFields(targetType: StructType, sourceType:
DataType): Boolean = {
+ sourceType match {
+ case sourceStructType: StructType =>
+ targetType.fields.exists { targetField =>
+ sourceStructType.fields.find(f => conf.resolver(f.name,
targetField.name)) match {
+ case Some(sourceField) =>
+ // Check nested structs recursively
+ (targetField.dataType, sourceField.dataType) match {
+ case (targetNested: StructType, sourceNested) =>
+ hasExtraTargetFields(targetNested, sourceNested)
+ case _ => false
+ }
+ case None => true // target has extra field not in source
+ }
+ }
+ case _ =>
+ // Should be caught earlier
+ throw SparkException.internalError(
+ s"Source type must be StructType but found: $sourceType")
+ }
+ }
+
+ /**
+ * As UPDATE SET * assigns struct fields individually (preserving existing
fields),
+ * this will lead to indiscriminate null expansion, ie, a struct is created
where all
+ * fields are null. Wraps a struct assignment with a condition to return
null
+ * if both conditions are true:
+ *
+ * - source struct is null
+ * - target struct is null OR target struct is same as source struct
+ *
+ * If the condition is not true, we preserve the original structure.
+ * This includes cases where the source was a struct of nulls,
+ * or there were any extra target fields (including null ones),
+ * both cases retain the assignment to a struct of nulls.
+ *
+ * @param key the original assignment key (target struct) expression
+ * @param value the original assignment value (source struct) expression
+ * @param structType the target struct type
+ * @param structExpression the result create struct expression result to wrap
+ * @param colPath the column path for error reporting
+ * @param addError error reporting function
+ * @return the wrapped expression with null checks
+ */
+ private def fixNullExpansion(
+ key: Expression,
+ value: Expression,
+ structType: StructType,
+ structExpression: Expression,
+ colPath: Seq[String],
+ addError: String => Unit): Expression = {
+ // As StoreAssignmentPolicy.LEGACY is not allowed in DSv2, always add null
check for
+ // non-nullable column
+ if (!key.nullable) {
+ AssertNotNull(value)
Review Comment:
I tried this change, i remember why i needed to skip the IF (...) NULL for
non-nullable fields.
In this line,
https://github.com/apache/spark/blob/cc03f8548317dd29cfd7e3d49fa26b1c87f04757/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/AssignmentUtils.scala#L246
it checks that the result of the assignment is non-nullable. If there is a
IF (...) NULL, then it returns false, and analyzer runs forever.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]