grundprinzip commented on code in PR #39017:
URL: https://github.com/apache/spark/pull/39017#discussion_r1047188863
##########
connector/connect/common/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -601,3 +602,10 @@ message Unpivot {
// (Required) Name of the value column.
string value_column_name = 5;
}
+
+// Randomly splits this Dataset with the provided weights.
+// Note: this message is just a wrapper for input relation.
+message RandomSplit {
+ // (Required) The input relation.
+ Relation input = 1;
Review Comment:
I might be missing some context, please help me :)
It looks like you want to express that RandomSplit can never occur outside
of Sample. So the wrapping would be something like
```
Sample(RandomSplit(input))
```
In the current approach for the Relations all relations are more or less
valid top level objects that can be used independently. I feel that in this
case RandomSplit should rather be an element of the Sample message.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]