Re: [PR] Add response field for DefineDataset and DefineFlow RPC [spark]

via GitHub Wed, 17 Sep 2025 11:24:52 -0700


SCHJonathan commented on code in PR #52328:
URL: https://github.com/apache/spark/pull/52328#discussion_r2356354984



##########
sql/connect/server/src/test/scala/org/apache/spark/sql/connect/pipelines/SparkDeclarativePipelinesServerSuite.scala:
##########
@@ -487,59 +487,99 @@ class SparkDeclarativePipelinesServerSuite
     }
   }
 
-  gridTest("DefineDataset returns resolved data name for default 
catalog/schema")(
-    Seq(
-      ("TEMPORARY_VIEW", DatasetType.TEMPORARY_VIEW, "tv"),
-      ("TABLE", DatasetType.TABLE, "tb"),
-      ("MV", DatasetType.MATERIALIZED_VIEW, "mv")
+  private case class DefineDatasetTestCase(
+    name: String,
+    datasetType: DatasetType,
+    datasetName: String,
+    defaultCatalog: String = "",
+    defaultDatabase: String = "",
+    expectedResolvedName: String
+  )
+
+  private val defineDatasetDefaultTests = Seq(
+    DefineDatasetTestCase(
+      name = "TEMPORARY_VIEW",
+      datasetType = DatasetType.TEMPORARY_VIEW,
+      datasetName = "tv",
+      expectedResolvedName = "`tv`"
+    ),
+    DefineDatasetTestCase(
+      name = "TABLE",
+      datasetType = DatasetType.TABLE,
+      datasetName = "tb",
+      expectedResolvedName = "`spark_catalog`.`default`.`tb`"
+    ),
+    DefineDatasetTestCase(
+      name = "MV",
+      datasetType = DatasetType.MATERIALIZED_VIEW,
+      datasetName = "mv",
+      expectedResolvedName = "`spark_catalog`.`default`.`mv`"
     )
-  ) { case (_, datasetType, datasetName) =>
+  ).map(tc => tc.name -> tc).toMap
+
+  private val defineDatasetCustomTests = Seq(
+    DefineDatasetTestCase(
+      "TEMPORARY_VIEW",
+      DatasetType.TEMPORARY_VIEW,
+      "tv",
+      "custom_catalog",
+      "custom_db",
+      "`tv`"

Review Comment:
   nit name the parameter like what you did previously so that it is readable



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -154,9 +154,11 @@ message PipelineCommandResult {
     optional string dataflow_graph_id = 1;
   }
   message DefineDatasetResult {
+    // A quoted identifier string representing fully resolved name of the 
dataset
     optional string resolved_data_name = 1;
   }
   message DefineFlowResult {
+    // A quoted identifier string representing fully resolved name of the flow

Review Comment:
   let's just say the `resolved name of the flow` and don't expose the 
implementation detail in the API doc, as implementation can always be changed



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -146,11 +146,21 @@ message PipelineCommand {
 message PipelineCommandResult {
   oneof result_type {
     CreateDataflowGraphResult create_dataflow_graph_result = 1;
+    DefineDatasetResult define_dataset_result = 2;
+    DefineFlowResult define_flow_result = 3;
   }
   message CreateDataflowGraphResult {
     // The ID of the created graph.
     optional string dataflow_graph_id = 1;
   }
+  message DefineDatasetResult {
+    // A quoted identifier string representing fully resolved name of the 
dataset
+    optional string resolved_data_name = 1;
+  }
+  message DefineFlowResult {
+    // A quoted identifier string representing fully resolved name of the flow
+    optional string resolved_flow_name = 1;
+  }

Review Comment:
   yeah look for the CI failure, there should be a CI failing because we miss 
the python stub and the CI failure should instruct you which script to run to 
generate the python stub



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Re: [PR] Add response field for DefineDataset and DefineFlow RPC [spark]

Reply via email to