cloud-fan commented on code in PR #55790: URL: https://github.com/apache/spark/pull/55790#discussion_r3226725866
########## docs/sql-migration-guide.md: ########## @@ -31,6 +31,7 @@ license: | - Since Spark 4.2, Spark enables order-independent checksums for shuffle outputs by default to detect data inconsistencies during indeterminate shuffle stage retries. If a checksum mismatch is detected, Spark rolls back and re-executes all succeeding stages that depend on the shuffle output. If rolling back is not possible for some succeeding stages, the job will fail. To restore the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to `false`. - Since Spark 4.2, support for Derby JDBC datasource is deprecated. - Since Spark 4.2, a new default method `mergeWith` has been added to the `CustomTaskMetric` interface. The default implementation sums the two metric values, which is correct for count-type metrics. Data source connector implementations that report non-additive metrics (e.g., maximum, average, compression ratio, or gauge values) must override `mergeWith` to provide correct merge semantics. +- Since Spark 4.2, the geospatial `GEOMETRY` and `GEOGRAPHY` types and the corresponding `ST_*` functions are enabled. See [Geospatial (Geometry/Geography) Types](sql-ref-geospatial-types.html) for additional details. Review Comment: I'd suggest dropping this bullet. The 4.1→4.2 entries above it all describe behavior changes that affect *existing* workloads (default-on order-independent checksums; new `CustomTaskMetric.mergeWith` contract for connectors; Derby JDBC deprecation), whereas turning on a new opt-in feature is purely additive: - `GEOMETRY` / `GEOGRAPHY` are non-reserved keywords in both ANSI and non-ANSI mode (`SqlBaseParser.g4:2048-2049, :2456-2457`), so they don't collide with existing identifiers. - `st_*` registry entries don't change resolution for code that doesn't call them. - The PR description correctly answers "Does this PR introduce any user-facing change? **No.**" — which is at odds with adding a migration bullet, since the migration guide is by definition for user-visible changes. This feels like release-notes territory rather than the migration guide. Happy to be talked out of it if there's a specific scenario I'm missing where an existing workload changes behavior at 4.2. ########## docs/sql-ref-datatypes.md: ########## @@ -95,8 +95,8 @@ Spark SQL and DataFrames support the following data types: * Spatial types Spatial objects as defined in the [OGC Simple Feature Access](https://portal.ogc.org/files/?artifact_id=25355) specification. - - `GeometryType`: Represents GEOMETRY values—spatial objects in a Cartesian coordinate system. The type can be fixed to a single SRID, e.g. `geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. Default SRID when not specified is 4326 (WGS 84). - - `GeographyType`: Represents GEOGRAPHY values—spatial objects in a geographic coordinate system (latitude/longitude). Edge interpolation is always SPHERICAL. The type can be fixed to a single SRID, e.g. `geography(4326)`, or allow mixed SRIDs with `geography(any)`. Default SRID is 4326 (WGS 84). + - `GeometryType`: Represents GEOMETRY values, spatial objects in a Cartesian coordinate system. The type must be fixed to a single SRID, e.g. `geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY` columns must always be declared with an explicit SRID or `ANY`. + - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a geographic coordinate system (latitude/longitude). Edge interpolation is always SPHERICAL. The type must be fixed to a single geographic SRID, e.g. `geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL, `GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`. Review Comment: Minor wording: the "must be fixed ... or allow mixed SRIDs" parallelism reads awkwardly — `must` is a stronger commitment than the `or allow` alternative accepts. The original `can be fixed ... or allow ...` was natural English and didn't lose any meaning, because the very next sentence already states the mandatory part ("In SQL, `GEOMETRY` columns must always be declared ..."). Suggest reverting that verb. ```suggestion - `GeometryType`: Represents GEOMETRY values, spatial objects in a Cartesian coordinate system. The type can be fixed to a single SRID, e.g. `geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY` columns must always be declared with an explicit SRID or `ANY`. - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a geographic coordinate system (latitude/longitude). Edge interpolation is always SPHERICAL. The type can be fixed to a single geographic SRID, e.g. `geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL, `GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
