[GitHub] [spark] grundprinzip commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

GitBox Thu, 22 Dec 2022 06:19:20 -0800


grundprinzip commented on code in PR #39091:
URL: https://github.com/apache/spark/pull/39091#discussion_r1055510989



##########
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -598,3 +599,18 @@ message ToSchema {
   // The Sever side will update the dataframe with this schema.
   DataType schema = 2;
 }
+
+// Collect arbitrary (named) metrics from a dataset.
+message CollectMetrics {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Required) Name of the metrics.
+  string name = 2;
+
+  // (Required) The metric sequence.
+  repeated Expression metrics = 3;
+
+  // (Optional) The identity whether Observation are used.
+  optional bool is_observation = 4;

Review Comment:
   So I have checked the code for how DF.observe works. In Scala it has two 
different overloads, one for Observation and one for string. Both end up 
calling the same underlying method on the dataframe. Both end up using the 
`CollectMetrics` and wrap it around the logical plan.
   
   There is no need to have this special type for using the Observation. The 
simplification for Observation should be created on the client side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] grundprinzip commented on a diff in pull request #39091: [SPARK-41527][CONNECT][PYTHON] Implement `DataFrame.observe`

Reply via email to