hvanhovell commented on code in PR #48791:
URL: https://github.com/apache/spark/pull/48791#discussion_r1876968390
##########
sql/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -97,13 +98,60 @@ message Relation {
// Catalog API (experimental / unstable)
Catalog catalog = 200;
+ // ML relation
+ MlRelation ml_relation = 300;
+
// This field is used to mark extensions to the protocol. When plugins
generate arbitrary
// relations they can add them here. During the planning the correct
resolution is done.
google.protobuf.Any extension = 998;
Unknown unknown = 999;
}
}
+// Relation to represent ML world
+message MlRelation {
+ oneof ml_type {
+ Transform transform = 1;
+ FetchAttr fetch_attr = 2;
+ }
+ // Relation to represent transform(input) of the operator
+ // which could be a cached model or a new transformer
+ message Transform {
+ oneof operator {
+ // Object reference
+ ObjectRef obj_ref = 1;
+ // Could be an ML transformer like VectorAssembler
+ MlOperator transformer = 2;
+ }
+ // the input dataframe
+ Relation input = 3;
+ // the operator specific parameters
+ MlParams params = 4;
+ }
+}
+
+// Message for fetching attribute from object on the server side.
+// FetchAttr can be represented as a Relation or a ML command
+// Eg, model.coefficients, model.summary.weightedPrecision
+// or model.summary.roc which returns a DataFrame
+message FetchAttr {
Review Comment:
TBH this feels a bit too much as a free for all. I understand the need to
fetch attributes, however as soon as we are passing arguments it feel like we
are doing far more than just reading data. Why is this needed? Is there a way
to make this more structured?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]