cloud-fan commented on code in PR #55631:
URL: https://github.com/apache/spark/pull/55631#discussion_r3184047829
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -666,6 +666,14 @@ private[sql] object QueryExecutionErrors extends
QueryErrorsBase with ExecutionE
summary = "")
}
+ def stInvalidArgumentErrorInvalidEndiannessValue(
+ sqlFunction: String, endianness: String): SparkIllegalArgumentException
= {
+ new SparkIllegalArgumentException(
+ errorClass = "ST_INVALID_ARGUMENT.INVALID_ENDIANNESS_VALUE",
Review Comment:
**Critical: this error class is not defined.**
`ST_INVALID_ARGUMENT.INVALID_ENDIANNESS_VALUE` has no entry in
`common/utils/src/main/resources/error/error-conditions.json`. The result is
visible in `st-functions.sql.out` — every invalid-endianness case throws
`INTERNAL_ERROR` with `"Cannot find main error class
'ST_INVALID_ARGUMENT.INVALID_ENDIANNESS_VALUE'"` as the captured output. CI is
green only because the golden file matches the broken behavior. Add the entry,
regenerate the golden file, and verify the new output is a real `ST_INVALID_…`
error.
Also note: every other ST error class (`ST_INVALID_SRID_VALUE`,
`ST_INVALID_CRS_VALUE`, `ST_INVALID_ALGORITHM_VALUE`) is flat — using a
subClass form here breaks that convention. Consider naming this
`ST_INVALID_ENDIANNESS_VALUE` for consistency.
##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/Geography.java:
##########
@@ -130,8 +130,14 @@ public static Geography fromEwkt(byte[] ewkt) {
/** Geography binary standard format converters: WKB and EWKB. */
@Override
- public byte[] toWkb() {
- return toWkbInternal(DEFAULT_ENDIANNESS);
+ public byte[] toWkb(String endianness) {
+ // Check the endianness value.
+ if (!endianness.equalsIgnoreCase("NDR") &&
!endianness.equalsIgnoreCase("XDR")) {
+ throw QueryExecutionErrors.stInvalidArgumentErrorInvalidEndiannessValue(
+ "st_asbinary", endianness);
Review Comment:
Layer concern: hardcoding `"st_asbinary"` as the SQL function name inside
`Geography.toWkb(String)` couples this low-level Geo utility to a specific SQL
function. Any future caller of `toWkb(String)` (a different SQL function,
library code, an internal caller) would mislabel its errors. Validation should
live at the expression layer (e.g., in `ST_AsBinary` via `checkInputDataTypes`
for foldable cases, or in a shared `STUtils.parseEndianness` helper invoked
from `STUtils.stAsBinary`). The same hardcoded name appears in
`Geometry.java:137`.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala:
##########
@@ -666,6 +666,14 @@ private[sql] object QueryExecutionErrors extends
QueryErrorsBase with ExecutionE
summary = "")
}
+ def stInvalidArgumentErrorInvalidEndiannessValue(
+ sqlFunction: String, endianness: String): SparkIllegalArgumentException
= {
+ new SparkIllegalArgumentException(
+ errorClass = "ST_INVALID_ARGUMENT.INVALID_ENDIANNESS_VALUE",
+ messageParameters = Map("sqlFunction" -> sqlFunction, "e" -> endianness)
Review Comment:
Use a descriptive parameter name instead of `e` — `endianness` matches the
function arg and reads better in the error message template.
##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/Geometry.java:
##########
@@ -130,8 +130,14 @@ public static Geometry fromEwkt(byte[] ewkt) {
/** Geometry binary standard format converters: WKB and EWKB. */
@Override
- public byte[] toWkb() {
- return toWkbInternal(DEFAULT_ENDIANNESS);
+ public byte[] toWkb(String endianness) {
+ // Check the endianness value.
+ if (!endianness.equalsIgnoreCase("NDR") &&
!endianness.equalsIgnoreCase("XDR")) {
+ throw QueryExecutionErrors.stInvalidArgumentErrorInvalidEndiannessValue(
+ "st_asbinary", endianness);
Review Comment:
Same as the comment on `Geography.java:137` — the validation block here is
identical. After moving validation up to a shared helper, both
`Geography.toWkb(String)` and `Geometry.toWkb(String)` can collapse to a single
delegation to `toWkb(ByteOrder)`.
##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/STUtils.java:
##########
@@ -78,6 +82,16 @@ public static GeometryVal geographyToGeometry(GeographyVal
geographyVal) {
return toPhysVal(Geometry.fromBytes(geographyVal.getBytes()));
}
+ // Cast geography to binary.
+ public static byte[] geographyToBinary(GeographyVal geographyVal) {
Review Comment:
`geographyToBinary` and `geometryToBinary` (lines 86 and 91) have
byte-for-byte the same body as the existing `stAsBinary(GeographyVal)` /
`stAsBinary(GeometryVal)` 1-arg overloads (lines 130 and 138). Two ways to
spell the same operation is more confusing than helpful — drop one set. The
simplest path: keep the `stAsBinary` 1-arg overloads and have
`ParquetWriteSupport` call those instead.
##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/STUtils.java:
##########
@@ -63,7 +67,7 @@ public static GeographyVal geometryToGeography(GeometryVal
geometryVal) {
}
// We also need to check whether the input geometry has coordinates in
geography bounds.
try {
- byte[] wkb = stAsBinary(geometryVal);
+ byte[] wkb = stAsBinary(geometryVal, ENDIANNESS_NDR);
Review Comment:
These internal callers (also lines 115, 123) call `stAsBinary(geo,
ENDIANNESS_NDR)` for known-NDR uses, which forces `UTF8String → String →
equalsIgnoreCase("NDR") / equalsIgnoreCase("XDR") → ByteOrder` for every
invocation — pure overhead. Call `geographyToBinary` / `geometryToBinary` (or
the 1-arg `stAsBinary` overloads) directly, like `ParquetWriteSupport.scala`
already does.
##########
sql/core/src/test/scala/org/apache/spark/sql/STExpressionsSuite.scala:
##########
@@ -476,16 +476,21 @@ class STExpressionsSuite
test("ST_AsBinary") {
// Test data: WKB representation of POINT(1 2).
- val wkb =
Hex.unhex("0101000000000000000000F03F0000000000000040".getBytes())
- val wkbLiteral = Literal.create(wkb, BinaryType)
+ val wkbNdr =
Hex.unhex("0101000000000000000000F03F0000000000000040".getBytes())
+ val wkbXdr =
Hex.unhex("00000000013FF00000000000004000000000000000".getBytes())
+ val wkbLiteral = Literal.create(wkbNdr, BinaryType)
// ST_GeogFromWKB and ST_AsBinary.
val geographyExpression = ST_GeogFromWKB(wkbLiteral)
assert(geographyExpression.dataType.sameType(defaultGeographyType))
- checkEvaluation(ST_AsBinary(geographyExpression), wkb)
+ checkEvaluation(new ST_AsBinary(geographyExpression), wkbNdr)
+ checkEvaluation(ST_AsBinary(geographyExpression, Literal.create("NDR")),
wkbNdr)
+ checkEvaluation(ST_AsBinary(geographyExpression, Literal.create("XDR")),
wkbXdr)
// ST_GeomFromWKB and ST_AsBinary.
val geometryExpression = new ST_GeomFromWKB(wkbLiteral)
assert(geometryExpression.dataType.sameType(defaultGeometryType))
- checkEvaluation(ST_AsBinary(geometryExpression), wkb)
+ checkEvaluation(new ST_AsBinary(geometryExpression), wkbNdr)
Review Comment:
Two test cases would close real coverage gaps: (1) mixed-case endianness —
the validation uses `equalsIgnoreCase`, so `'ndr'`, `'xdr'`, `'Ndr'` are
accepted, but no test exercises that path; (2) NULL endianness —
`ST_AsBinary(geo, NULL)` should propagate to NULL but isn't covered. Worth
adding both alongside the existing `'NDR'`/`'XDR'` cases.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/st/stExpressions.scala:
##########
@@ -51,55 +52,69 @@ sealed trait GeospatialInputTypes extends
ImplicitCastInputTypes {
private[sql] object ExpressionDefaults {
val DEFAULT_GEOGRAPHY_SRID: Int = Geography.DEFAULT_SRID
val DEFAULT_GEOMETRY_SRID: Int = Geometry.DEFAULT_SRID
+ val DEFAULT_WKB_ENDIANNESS: String = "NDR"
Review Comment:
This duplicates `STUtils.NDR` (line 32 of `STUtils.java`). Reference
`STUtils.NDR` here instead of defining a parallel `DEFAULT_WKB_ENDIANNESS`.
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/st/stExpressions.scala:
##########
@@ -51,55 +52,69 @@ sealed trait GeospatialInputTypes extends
ImplicitCastInputTypes {
private[sql] object ExpressionDefaults {
val DEFAULT_GEOGRAPHY_SRID: Int = Geography.DEFAULT_SRID
val DEFAULT_GEOMETRY_SRID: Int = Geometry.DEFAULT_SRID
+ val DEFAULT_WKB_ENDIANNESS: String = "NDR"
}
/** ST writer expressions. */
/**
- * Returns the input GEOGRAPHY or GEOMETRY value in WKB format.
+ * Returns the input GEOGRAPHY or GEOMETRY value in WKB format using the
specified endianness, if
+ * provided. If no endianness is provided, it defaults to little endian.
* See
https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary
* for more details on the WKB format.
*/
@ExpressionDescription(
- usage = "_FUNC_(geo) - Returns the geospatial value (value of type GEOGRAPHY
or GEOMETRY) "
- + "in WKB format.",
+ usage = "_FUNC_(geo, endianness) - Returns the geospatial value (value of
type GEOGRAPHY or "
Review Comment:
Usage signature should mark `endianness` as optional, matching the
convention used by peers like `ST_GeomFromWKB` (`_FUNC_(wkb[, srid])`).
```suggestion
usage = "_FUNC_(geo[, endianness]) - Returns the geospatial value (value
of type GEOGRAPHY or "
```
##########
sql/core/src/test/resources/sql-tests/results/st-functions.sql.out:
##########
@@ -768,6 +768,200 @@ org.apache.spark.SparkIllegalArgumentException
}
+-- !query
+SELECT ST_AsBinary(NULL)
+-- !query schema
+struct<st_asbinary(NULL, NDR):binary>
+-- !query output
+NULL
+
+
+-- !query
+SELECT
hex(ST_AsBinary(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040')))
+-- !query schema
+struct<hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'),
NDR)):string>
+-- !query output
+0101000000000000000000F03F0000000000000040
+
+
+-- !query
+SELECT
hex(ST_AsBinary(ST_GeogFromWKB(X'00000000013FF00000000000004000000000000000'),
'NDR'))
+-- !query schema
+struct<hex(st_asbinary(st_geogfromwkb(X'00000000013FF00000000000004000000000000000'),
NDR)):string>
+-- !query output
+0101000000000000000000F03F0000000000000040
+
+
+-- !query
+SELECT
hex(ST_AsBinary(ST_GeogFromWKB(X'0101000000000000000000F03F0000000000000040'),
'XDR'))
+-- !query schema
+struct<hex(st_asbinary(st_geogfromwkb(X'0101000000000000000000F03F0000000000000040'),
XDR)):string>
+-- !query output
+00000000013FF00000000000004000000000000000
+
+
+-- !query
+SELECT
hex(ST_AsBinary(ST_GeomFromWKB(X'0101000000000000000000F03F0000000000000040')))
+-- !query schema
+struct<hex(st_asbinary(st_geomfromwkb(X'0101000000000000000000F03F0000000000000040',
0), NDR)):string>
+-- !query output
+0101000000000000000000F03F0000000000000040
+
+
+-- !query
+SELECT
hex(ST_AsBinary(ST_GeomFromWKB(X'00000000013FF00000000000004000000000000000'),
'NDR'))
+-- !query schema
+struct<hex(st_asbinary(st_geomfromwkb(X'00000000013FF00000000000004000000000000000',
0), NDR)):string>
Review Comment:
These `INTERNAL_ERROR` outputs (and the matching ones below for `'ABC'`,
`'big-endian'`, and the table-level cases) reflect the missing error class
flagged on `QueryExecutionErrors.scala`. After defining the
`ST_INVALID_…ENDIANNESS_VALUE` entry in `error-conditions.json` and
regenerating, these blocks should show the real ST error — not
`INTERNAL_ERROR`. Worth verifying carefully when the golden file is regenerated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]