Re: [PR] [SPARK-52807][SDP] Proto changes to support analysis inside Declarative Pipelines query functions [spark]

via GitHub Sat, 18 Oct 2025 09:03:53 -0700


sryza commented on code in PR #52154:
URL: https://github.com/apache/spark/pull/52154#discussion_r2388283836



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -90,6 +92,24 @@ message PipelineCommand {
     optional string format = 8;
   }
 
+  // Metadata about why a query function failed to be executed successfully.
+  message QueryFunctionFailure {
+    // Identifier for a dataset within the graph that the query function 
needed to know the schema
+    // of but which had not yet been analyzed itself.
+    optional string missing_dependency = 1;

Review Comment:
   Based on further thinking and discussion, it seems like we might be able to 
just leave this out for now: when the server fails to analyze a plan, it knows 
what flow that plan was associated with, so it can just do the bookkeeping on 
its side.
   
   There might be some edge situations (e.g. at the beginning) where this means 
that we end up needing one more query function invocation than we otherwise 
would, for query functions that do analysis. But we can bias towards simplicity 
for now and optimize later if we need to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-52807][SDP] Proto changes to support analysis inside Declarative Pipelines query functions [spark]

Reply via email to