xkrogen commented on code in PR #40147:
URL: https://github.com/apache/spark/pull/40147#discussion_r1117328132
##########
connector/connect/common/src/main/protobuf/spark/connect/base.proto:
##########
@@ -183,6 +183,60 @@ message ExecutePlanResponse {
}
}
+// Request to transfer client-local artifacts.
+message AddArtifactsRequest {
+
+ // Definition of an Artifact.
+ message Artifact {
+ // The name of the artifact is expected in the form of a "Relative Path"
that is made up of a
+ // sequence of directories and the final file element.
+ // Examples of "Relative Path"s: "jars/test.jar", "classes/xyz.class",
"abc.xyz", "a/b/X.jar".
+ // The server is expected to maintain the hierarchy of files as defined by
their name. (i.e
+ // The relative path of the file on the server's filesystem will be the
same as the name of
+ // the provided artifact)
+ string name = 1;
+ // Raw data.
+ bytes data = 2;
+ }
+
+ // A number of small artifacts batched into a single RPC.
+ message Batch {
+ repeated Artifact artifacts = 1;
+ }
+
+ // The client_id is set by the client to be able to collate streaming
responses from
+ // different queries.
+ string client_id = 1;
+
+ // User context
+ UserContext user_context = 2;
+
+ // The payload is either a batch of artifacts or a partial chunk of a large
artifact.
+ oneof payload {
+ Batch batch = 3;
Review Comment:
Agreed that we need the ability to both chunk and to batch, but my point is
that `repeated Artifact artifacts` already achieves this by allowing you to put
one artifact (for chunking purposes) or many (for small classes). So instead of:
```proto
message Batch {
repeated Artifact artifacts = 1;
}
oneof payload {
Batch batch = 3;
Artifact chunk = 4;
}
```
you would just have:
```
repeated Artifact artifacts = 3;
```
AFAICT there is no loss of functionality here? You can choose to send an
`AddArtifactsRequest` with just a single artifact, same as if you used the
`Artifact chunk` branch of `payload`. Or you can choose to send a request with
multiple artifacts, same as if you used the `Batch`. It's just that you don't
need to wrap those artifacts inside of a `Batch`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]