cloud-fan commented on code in PR #55599:
URL: https://github.com/apache/spark/pull/55599#discussion_r3160524432


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##########
@@ -952,3 +952,40 @@ case class SchemaOfVariantAgg(
   override protected def withNewChildInternal(newChild: Expression): 
Expression =
     copy(child = newChild)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is 
malformed.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(parse_json('null'));
+       true
+      > SELECT _FUNC_(parse_json('[{"b":true,"a":0}]'));
+       true
+  """,
+  since = "4.2.0",
+  group = "variant_funcs"
+)
+case class IsValidVariant(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {

Review Comment:
   Consider aligning this with the peer pattern used by `IsVariantNull` (line 
101), `SchemaOfVariant`, and `ParseJson`: extend `Predicate` (which provides 
`dataType = BooleanType`, making the override below redundant) and use 
`RuntimeReplaceable` + `StaticInvoke` instead of `nullSafeEval` + a hand-rolled 
`doGenCode`. Adding `def isValidVariant(input: VariantVal): Boolean = 
VariantUtil.isValidVariant(input.getValue, input.getMetadata)` to 
`VariantExpressionEvalUtils` and calling it through `StaticInvoke` would let 
you drop both `nullSafeEval` and `doGenCode` here. There's nothing 
instance-specific (no `addReferenceObj`-style state, unlike `ToVariantObject`) 
that prevents the cleaner pattern.



##########
common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java:
##########
@@ -589,6 +590,80 @@ public static <T> T handleArray(byte[] value, int pos, 
ArrayHandler<T> handler)
     return handler.apply(size, offsetSize, offsetStart, dataStart);
   }
 
+  // Validate whether a variant is well-formed. Returns true if the variant is 
valid, false if it
+  // is malformed.
+  // Specifically, it returns true if and only if `toJson` doesn't throw an 
exception. The
+  // implementation also has the same structure as `toJson` (see 
`Variant.toJsonImpl`).

Review Comment:
   Two refinements worth noting in this comment:
   
   1. The "if and only if `toJson` doesn't throw" claim isn't strict. 
`Variant.toJson` first goes through the `Variant` constructor, which throws 
`VARIANT_CONSTRUCTOR_SIZE_LIMIT` for value/metadata over `SIZE_LIMIT` — 
`isValidVariant` skips that check. A TIMESTAMP/TIMESTAMP_NTZ value out of 
`Instant`'s range also passes `validateImpl` (which only calls `getLong`) but 
trips `DateTimeException`/`ArithmeticException` from `Instant.EPOCH.plus(...)` 
inside `toJsonImpl`. Consider softening to something like "returns true if the 
variant binary is structurally well-formed (all bounds and type-info checks 
pass)," or invoking `new Variant(...).toJson(...)` directly to honor the 
literal contract.
   
   2. The `try { ... } catch (SparkRuntimeException e)` is sound only because 
every helper invoked inside `validateImpl` throws `MALFORMED_VARIANT` / 
`UNKNOWN_PRIMITIVE_TYPE_IN_VARIANT` rather than raw 
`ArrayIndexOutOfBoundsException` — and this PR's `getDecimalWithOriginalScale` 
patch is exactly closing one such leak. A short note on `validateImpl` ("relies 
on every helper throwing `SparkRuntimeException` rather than AIOOBE on 
malformed input") would help future maintainers preserve that invariant when 
adding cases.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to