EnricoMi opened a new pull request #33545: URL: https://github.com/apache/spark/pull/33545
### What changes were proposed in this pull request? As @HyukjinKwon pointed out, the Observation API (Scala, Java, PySpark) could return a `Map` / `Dict`. It currently returns `Row` simply because the metrics are (internal to Observation) retrieved from the listener as rows. Since that is hidden from the user by the Observation API, there is no need to return `Row`. While touching this code, this moves the unit tests from `DataFrameSuite,scala` to `DatasetSuite.scala` and from `JavaDataFrameSuite.java` to `JavaDatasetSuite.java`, which is a better place. ### Why are the changes needed? This simplifies the API and accessing the metrics, especially in Java. There is no need for the concept `Row` when retrieving the observation result. ### Does this PR introduce _any_ user-facing change? Yes, it replaces `get` with `getAsRow` and introduces `getAsMap`, `getAsJavaMap`, `getAsDict`. So this removes `Observation.get`, but this has just been added in the current 3.3.0 version. ### How was this patch tested? This is tested in `DatasetSuite.SPARK-34806: observation on datasets`, `JavaDatasetSuite.testObservation` and `test_dataframe.test_observe`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
