EnricoMi opened a new pull request #33545:
URL: https://github.com/apache/spark/pull/33545


   ### What changes were proposed in this pull request?
   As @HyukjinKwon pointed out, the Observation API (Scala, Java, PySpark) 
could return a `Map` / `Dict`. It currently returns `Row` simply because the 
metrics are (internal to Observation) retrieved from the listener as rows. 
Since that is hidden from the user by the Observation API, there is no need to 
return `Row`.
   
   While touching this code, this moves the unit tests from 
`DataFrameSuite,scala` to `DatasetSuite.scala` and from 
`JavaDataFrameSuite.java` to `JavaDatasetSuite.java`, which is a better place.
   
   ### Why are the changes needed?
   This simplifies the API and accessing the metrics, especially in Java. There 
is no need for the concept `Row` when retrieving the observation result.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, it replaces `get` with `getAsRow` and introduces `getAsMap`, 
`getAsJavaMap`, `getAsDict`. So this removes `Observation.get`, but this has 
just been added in the current 3.3.0 version.
   
   ### How was this patch tested?
   This is tested in `DatasetSuite.SPARK-34806: observation on datasets`, 
`JavaDatasetSuite.testObservation` and `test_dataframe.test_observe`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to