Re: [PR] [SPARK-56650][SDP][CONNECT] Add AutoCDC Spark Connect APIs [spark]

via GitHub Wed, 20 May 2026 11:13:10 -0700


AnishMahto commented on code in PR #55589:
URL: https://github.com/apache/spark/pull/55589#discussion_r3276070395



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -154,6 +156,61 @@ message PipelineCommand {
       optional spark.connect.Relation relation = 1;
     }
 
+    // Details for Apply Changes Into (ACI) flows.
+    message AutoCdcFlowDetails {
+      // The name of the CDC source to stream from.
+      optional string source = 1;
+
+      // Column(s) that uniquely identify a row in source and target data.
+      repeated Expression keys = 2;
+
+      // Expression to order the source data.
+      optional Expression sequence_by = 3;
+
+      // Optional condition applied to source and target for optimizations 
like partition pruning.
+      optional Expression where = 4;
+
+      // Whether to ignore null values in source data updates.
+      optional bool ignore_null_updates = 5;

Review Comment:
   IIRC in the SPIP we decided to drop the boolean ignore null configuration 
altogether in favor of the ignore null columns list?



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -154,6 +156,61 @@ message PipelineCommand {
       optional spark.connect.Relation relation = 1;
     }
 
+    // Details for Apply Changes Into (ACI) flows.
+    message AutoCdcFlowDetails {
+      // The name of the CDC source to stream from.
+      optional string source = 1;
+
+      // Column(s) that uniquely identify a row in source and target data.
+      repeated Expression keys = 2;
+
+      // Expression to order the source data.
+      optional Expression sequence_by = 3;
+
+      // Optional condition applied to source and target for optimizations 
like partition pruning.
+      optional Expression where = 4;
+
+      // Whether to ignore null values in source data updates.
+      optional bool ignore_null_updates = 5;
+
+      // Delete condition for the merged operation.
+      optional Expression apply_as_deletes = 6;
+
+      // Truncate condition for the merged operation.
+      optional Expression apply_as_truncates = 7;
+
+      // Columns included in the output table.
+      repeated Expression column_list = 8;
+
+      // Columns excluded from the output table.
+      repeated Expression except_column_list = 9;
+
+      // SCD Type for target table.
+      SCDType stored_as_scd_type = 10;
+
+      // Columns tracked for change history.
+      repeated Expression track_history_column_list = 11;
+
+      // Columns not tracked for change history.
+      repeated Expression track_history_except_column_list = 12;
+
+      // Subset of columns to ignore null in updates.
+      repeated Expression ignore_null_updates_column_list = 14;
+
+      // Subset of columns excluded from ignoring null in updates.
+      repeated Expression ignore_null_updates_except_column_list = 15;
+
+      // Column indicating which user columns to update or ignore.
+      optional Expression columns_to_update = 16;

Review Comment:
   Same thing here, is this an API we reached consensus on in the SPIP?



##########
sql/connect/common/src/main/protobuf/spark/connect/pipelines.proto:
##########
@@ -154,6 +156,61 @@ message PipelineCommand {
       optional spark.connect.Relation relation = 1;
     }
 
+    // Details for Apply Changes Into (ACI) flows.
+    message AutoCdcFlowDetails {
+      // The name of the CDC source to stream from.
+      optional string source = 1;
+
+      // Column(s) that uniquely identify a row in source and target data.
+      repeated Expression keys = 2;
+
+      // Expression to order the source data.
+      optional Expression sequence_by = 3;
+
+      // Optional condition applied to source and target for optimizations 
like partition pruning.
+      optional Expression where = 4;

Review Comment:
   Let's drop this as it's not an API we've agreed on for yet in the SPIP? We 
can just reserve the proto number if that's important.



##########
python/pyspark/sql/connect/proto/pipelines_pb2.py:
##########
@@ -38,12 +38,13 @@
 from google.protobuf import any_pb2 as google_dot_protobuf_dot_any__pb2
 from google.protobuf import timestamp_pb2 as 
google_dot_protobuf_dot_timestamp__pb2
 from pyspark.sql.connect.proto import common_pb2 as 
spark_dot_connect_dot_common__pb2
+from pyspark.sql.connect.proto import expressions_pb2 as 
spark_dot_connect_dot_expressions__pb2
 from pyspark.sql.connect.proto import relations_pb2 as 
spark_dot_connect_dot_relations__pb2
 from pyspark.sql.connect.proto import types_pb2 as 
spark_dot_connect_dot_types__pb2
 
 
 DESCRIPTOR = _descriptor_pool.Default().AddSerializedFile(
-    
b'\n\x1dspark/connect/pipelines.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1aspark/connect/common.proto\x1a\x1dspark/connect/relations.proto\x1a\x19spark/connect/types.proto"\xa4\'\n\x0fPipelineCommand\x12h\n\x15\x63reate_dataflow_graph\x18\x01
 
\x01(\x0b\x32\x32.spark.connect.PipelineCommand.CreateDataflowGraphH\x00R\x13\x63reateDataflowGraph\x12R\n\rdefine_output\x18\x02
 
\x01(\x0b\x32+.spark.connect.PipelineCommand.DefineOutputH\x00R\x0c\x64\x65\x66ineOutput\x12L\n\x0b\x64\x65\x66ine_flow\x18\x03
 
\x01(\x0b\x32).spark.connect.PipelineCommand.DefineFlowH\x00R\ndefineFlow\x12\x62\n\x13\x64rop_dataflow_graph\x18\x04
 
\x01(\x0b\x32\x30.spark.connect.PipelineCommand.DropDataflowGraphH\x00R\x11\x64ropDataflowGraph\x12\x46\n\tstart_run\x18\x05
 
\x01(\x0b\x32\'.spark.connect.PipelineCommand.StartRunH\x00R\x08startRun\x12r\n\x19\x64\x65\x66ine_sql_graph_elements\x18\x06
 \x01(\x0b\x32\x35.spark.connect.PipelineCommand.DefineSqlGraph
 
ElementsH\x00R\x16\x64\x65\x66ineSqlGraphElements\x12\xa1\x01\n*get_query_function_execution_signal_stream\x18\x07
 
\x01(\x0b\x32\x44.spark.connect.PipelineCommand.GetQueryFunctionExecutionSignalStreamH\x00R%getQueryFunctionExecutionSignalStream\x12\x88\x01\n!define_flow_query_function_result\x18\x08
 
\x01(\x0b\x32<.spark.connect.PipelineCommand.DefineFlowQueryFunctionResultH\x00R\x1d\x64\x65\x66ineFlowQueryFunctionResult\x12\x65\n\x14\x65xecute_output_flows\x18\t
 
\x01(\x0b\x32\x31.spark.connect.PipelineCommand.ExecuteOutputFlowsH\x00R\x12\x65xecuteOutputFlows\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x1a\xb4\x02\n\x13\x43reateDataflowGraph\x12,\n\x0f\x64\x65\x66\x61ult_catalog\x18\x01
 
\x01(\tH\x00R\x0e\x64\x65\x66\x61ultCatalog\x88\x01\x01\x12.\n\x10\x64\x65\x66\x61ult_database\x18\x02
 
\x01(\tH\x01R\x0f\x64\x65\x66\x61ultDatabase\x88\x01\x01\x12Z\n\x08sql_conf\x18\x05
 \x03(\x0b\x32?.spark.connect.PipelineCommand.CreateDataflowGraph.SqlCon
 fEntryR\x07sqlConf\x1a:\n\x0cSqlConfEntry\x12\x10\n\x03key\x18\x01 
\x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\x12\n\x10_default_catalogB\x13\n\x11_default_database\x1aZ\n\x11\x44ropDataflowGraph\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_id\x1a\x92\n\n\x0c\x44\x65\x66ineOutput\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x01R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12$\n\x0boutput_name\x18\x02
 \x01(\tH\x02R\noutputName\x88\x01\x01\x12?\n\x0boutput_type\x18\x03 
\x01(\x0e\x32\x19.spark.connect.OutputTypeH\x03R\noutputType\x88\x01\x01\x12\x1d\n\x07\x63omment\x18\x04
 \x01(\tH\x04R\x07\x63omment\x88\x01\x01\x12X\n\x14source_code_location\x18\x05 
\x01(\x0b\x32!.spark.connect.SourceCodeLocationH\x05R\x12sourceCodeLocation\x88\x01\x01\x12_\n\rtable_details\x18\x06
 
\x01(\x0b\x32\x38.spark.connect.PipelineCommand.DefineOutput.TableDetailsH\x00R\x0ctableDetails\x12\\\n\x0csink_
 details\x18\x07 
\x01(\x0b\x32\x37.spark.connect.PipelineCommand.DefineOutput.SinkDetailsH\x00R\x0bsinkDetails\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x1a\xc0\x03\n\x0cTableDetails\x12x\n\x10table_properties\x18\x01
 
\x03(\x0b\x32M.spark.connect.PipelineCommand.DefineOutput.TableDetails.TablePropertiesEntryR\x0ftableProperties\x12%\n\x0epartition_cols\x18\x02
 \x03(\tR\rpartitionCols\x12\x1b\n\x06\x66ormat\x18\x03 
\x01(\tH\x01R\x06\x66ormat\x88\x01\x01\x12\x43\n\x10schema_data_type\x18\x04 
\x01(\x0b\x32\x17.spark.connect.DataTypeH\x00R\x0eschemaDataType\x12%\n\rschema_string\x18\x05
 \x01(\tH\x00R\x0cschemaString\x12-\n\x12\x63lustering_columns\x18\x06 
\x03(\tR\x11\x63lusteringColumns\x1a\x42\n\x14TablePropertiesEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\x08\n\x06schemaB\t\n\x07_format\x1a\xd1\x01\n\x0bSinkDetails\x12^\n\x07options\x18\x01
 \x03(\x0b\x32\x44.spark.connect.Pip
 
elineCommand.DefineOutput.SinkDetails.OptionsEntryR\x07options\x12\x1b\n\x06\x66ormat\x18\x02
 
\x01(\tH\x00R\x06\x66ormat\x88\x01\x01\x1a:\n\x0cOptionsEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\t\n\x07_formatB\t\n\x07\x64\x65tailsB\x14\n\x12_dataflow_graph_idB\x0e\n\x0c_output_nameB\x0e\n\x0c_output_typeB\n\n\x08_commentB\x17\n\x15_source_code_location\x1a\xff\x06\n\nDefineFlow\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 \x01(\tH\x01R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12 \n\tflow_name\x18\x02 
\x01(\tH\x02R\x08\x66lowName\x88\x01\x01\x12\x33\n\x13target_dataset_name\x18\x03
 \x01(\tH\x03R\x11targetDatasetName\x88\x01\x01\x12Q\n\x08sql_conf\x18\x04 
\x03(\x0b\x32\x36.spark.connect.PipelineCommand.DefineFlow.SqlConfEntryR\x07sqlConf\x12
 \n\tclient_id\x18\x05 
\x01(\tH\x04R\x08\x63lientId\x88\x01\x01\x12X\n\x14source_code_location\x18\x06 
\x01(\x0b\x32!.spark.connect.SourceCodeLocationH\x05R\x12sourceCodeLocation\x88\x01\x0
 1\x12x\n\x15relation_flow_details\x18\x07 
\x01(\x0b\x32\x42.spark.connect.PipelineCommand.DefineFlow.WriteRelationFlowDetailsH\x00R\x13relationFlowDetails\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x12\x17\n\x04once\x18\x08
 
\x01(\x08H\x06R\x04once\x88\x01\x01\x1a:\n\x0cSqlConfEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x1a\x61\n\x18WriteRelationFlowDetails\x12\x38\n\x08relation\x18\x01
 
\x01(\x0b\x32\x17.spark.connect.RelationH\x00R\x08relation\x88\x01\x01\x42\x0b\n\t_relation\x1a:\n\x08Response\x12
 \n\tflow_name\x18\x01 
\x01(\tH\x00R\x08\x66lowName\x88\x01\x01\x42\x0c\n\n_flow_nameB\t\n\x07\x64\x65tailsB\x14\n\x12_dataflow_graph_idB\x0c\n\n_flow_nameB\x16\n\x14_target_dataset_nameB\x0c\n\n_client_idB\x17\n\x15_source_code_locationB\x07\n\x05_once\x1a\xe4\x02\n\x12\x45xecuteOutputFlows\x12U\n\rdefine_output\x18\x01
 \x01(\x0b\x32+.spark.connect.PipelineCommand.DefineOutputH\x00R\x
 0c\x64\x65\x66ineOutput\x88\x01\x01\x12L\n\x0c\x64\x65\x66ine_flows\x18\x02 
\x03(\x0b\x32).spark.connect.PipelineCommand.DefineFlowR\x0b\x64\x65\x66ineFlows\x12&\n\x0c\x66ull_refresh\x18\x03
 \x01(\x08H\x01R\x0b\x66ullRefresh\x88\x01\x01\x12\x1d\n\x07storage\x18\x04 
\x01(\tH\x02R\x07storage\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07 
\x03(\x0b\x32\x14.google.protobuf.AnyR\textensionB\x10\n\x0e_define_outputB\x0f\n\r_full_refreshB\n\n\x08_storage\x1a\xc2\x02\n\x08StartRun\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12\x34\n\x16\x66ull_refresh_selection\x18\x02
 \x03(\tR\x14\x66ullRefreshSelection\x12-\n\x10\x66ull_refresh_all\x18\x03 
\x01(\x08H\x01R\x0e\x66ullRefreshAll\x88\x01\x01\x12+\n\x11refresh_selection\x18\x04
 \x03(\tR\x10refreshSelection\x12\x15\n\x03\x64ry\x18\x05 
\x01(\x08H\x02R\x03\x64ry\x88\x01\x01\x12\x1d\n\x07storage\x18\x06 
\x01(\tH\x03R\x07storage\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x13\n\x11_full_refresh_allB\
 
x06\n\x04_dryB\n\n\x08_storage\x1a\xc7\x01\n\x16\x44\x65\x66ineSqlGraphElements\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12\'\n\rsql_file_path\x18\x02
 \x01(\tH\x01R\x0bsqlFilePath\x88\x01\x01\x12\x1e\n\x08sql_text\x18\x03 
\x01(\tH\x02R\x07sqlText\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x10\n\x0e_sql_file_pathB\x0b\n\t_sql_text\x1a\x9e\x01\n%GetQueryFunctionExecutionSignalStream\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 \x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12 \n\tclient_id\x18\x02 
\x01(\tH\x01R\x08\x63lientId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x0c\n\n_client_id\x1a\xc6\x02\n\x1d\x44\x65\x66ineFlowQueryFunctionResult\x12$\n\tflow_name\x18\x01
 
\x01(\tB\x02\x18\x01H\x00R\x08\x66lowName\x88\x01\x01\x12O\n\x0f\x66low_identifier\x18\x04
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x01R\x0e\x66lowIdentifier\x88\x01\x01\x12/\n\x11\x64\x61taflow_graph_id\x18\x02
 \x01(\tH\x02R\x0f\x64\x61taflowGraphId\x88\x0
 1\x01\x12\x38\n\x08relation\x18\x03 
\x01(\x0b\x32\x17.spark.connect.RelationH\x03R\x08relation\x88\x01\x01\x42\x0c\n\n_flow_nameB\x12\n\x10_flow_identifierB\x14\n\x12_dataflow_graph_idB\x0b\n\t_relationB\x0e\n\x0c\x63ommand_type"\xf0\x05\n\x15PipelineCommandResult\x12\x81\x01\n\x1c\x63reate_dataflow_graph_result\x18\x01
 
\x01(\x0b\x32>.spark.connect.PipelineCommandResult.CreateDataflowGraphResultH\x00R\x19\x63reateDataflowGraphResult\x12k\n\x14\x64\x65\x66ine_output_result\x18\x02
 
\x01(\x0b\x32\x37.spark.connect.PipelineCommandResult.DefineOutputResultH\x00R\x12\x64\x65\x66ineOutputResult\x12\x65\n\x12\x64\x65\x66ine_flow_result\x18\x03
 
\x01(\x0b\x32\x35.spark.connect.PipelineCommandResult.DefineFlowResultH\x00R\x10\x64\x65\x66ineFlowResult\x1a\x62\n\x19\x43reateDataflowGraphResult\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_id\x1a\x85\x01\n\x12\x44\x65\x66ineOutputResult\x12W\n\x13resolved_identifier\x18\x
 01 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x00R\x12resolvedIdentifier\x88\x01\x01\x42\x16\n\x14_resolved_identifier\x1a\x83\x01\n\x10\x44\x65\x66ineFlowResult\x12W\n\x13resolved_identifier\x18\x01
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x00R\x12resolvedIdentifier\x88\x01\x01\x42\x16\n\x14_resolved_identifierB\r\n\x0bresult_type"I\n\x13PipelineEventResult\x12\x32\n\x05\x65vent\x18\x01
 
\x01(\x0b\x32\x1c.spark.connect.PipelineEventR\x05\x65vent"t\n\rPipelineEvent\x12\x38\n\ttimestamp\x18\x01
 
\x01(\x0b\x32\x1a.google.protobuf.TimestampR\ttimestamp\x12\x1d\n\x07message\x18\x02
 
\x01(\tH\x00R\x07message\x88\x01\x01\x42\n\n\x08_message"\xf1\x01\n\x12SourceCodeLocation\x12
 \n\tfile_name\x18\x01 
\x01(\tH\x00R\x08\x66ileName\x88\x01\x01\x12$\n\x0bline_number\x18\x02 
\x01(\x05H\x01R\nlineNumber\x88\x01\x01\x12,\n\x0f\x64\x65\x66inition_path\x18\x03
 
\x01(\tH\x02R\x0e\x64\x65\x66initionPath\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07
 \x03(\x0b\x32\x14.google.protobuf.AnyR\texte
 
nsionB\x0c\n\n_file_nameB\x0e\n\x0c_line_numberB\x12\n\x10_definition_path"\x97\x01\n$PipelineQueryFunctionExecutionSignal\x12!\n\nflow_names\x18\x01
 \x03(\tB\x02\x18\x01R\tflowNames\x12L\n\x10\x66low_identifiers\x18\x02 
\x03(\x0b\x32!.spark.connect.ResolvedIdentifierR\x0f\x66lowIdentifiers"\xf0\x02\n\x17PipelineAnalysisContext\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12,\n\x0f\x64\x65\x66inition_path\x18\x02
 \x01(\tH\x01R\x0e\x64\x65\x66initionPath\x88\x01\x01\x12$\n\tflow_name\x18\x03 
\x01(\tB\x02\x18\x01H\x02R\x08\x66lowName\x88\x01\x01\x12O\n\x0f\x66low_identifier\x18\x04
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x03R\x0e\x66lowIdentifier\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07
 
\x03(\x0b\x32\x14.google.protobuf.AnyR\textensionB\x14\n\x12_dataflow_graph_idB\x12\n\x10_definition_pathB\x0c\n\n_flow_nameB\x12\n\x10_flow_identifier*i\n\nOutputType\x12\x1b\n\x17OUTPUT_TYPE_UNSPECIFIED\x10\x00\x12\x15\n\x11MATERIALIZED_VI
 
EW\x10\x01\x12\t\n\x05TABLE\x10\x02\x12\x12\n\x0eTEMPORARY_VIEW\x10\x03\x12\x08\n\x04SINK\x10\x04\x42\x36\n\x1eorg.apache.spark.connect.protoP\x01Z\x12internal/generatedb\x06proto3'
+    
b'\n\x1dspark/connect/pipelines.proto\x12\rspark.connect\x1a\x19google/protobuf/any.proto\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1aspark/connect/common.proto\x1a\x1fspark/connect/expressions.proto\x1a\x1dspark/connect/relations.proto\x1a\x19spark/connect/types.proto"\xbb\x32\n\x0fPipelineCommand\x12h\n\x15\x63reate_dataflow_graph\x18\x01
 
\x01(\x0b\x32\x32.spark.connect.PipelineCommand.CreateDataflowGraphH\x00R\x13\x63reateDataflowGraph\x12R\n\rdefine_output\x18\x02
 
\x01(\x0b\x32+.spark.connect.PipelineCommand.DefineOutputH\x00R\x0c\x64\x65\x66ineOutput\x12L\n\x0b\x64\x65\x66ine_flow\x18\x03
 
\x01(\x0b\x32).spark.connect.PipelineCommand.DefineFlowH\x00R\ndefineFlow\x12\x62\n\x13\x64rop_dataflow_graph\x18\x04
 
\x01(\x0b\x32\x30.spark.connect.PipelineCommand.DropDataflowGraphH\x00R\x11\x64ropDataflowGraph\x12\x46\n\tstart_run\x18\x05
 
\x01(\x0b\x32\'.spark.connect.PipelineCommand.StartRunH\x00R\x08startRun\x12r\n\x19\x64\x65\x66ine_sql_graph_elements\x18\x06
 \x01(\x0b\x32\x35.spa
 
rk.connect.PipelineCommand.DefineSqlGraphElementsH\x00R\x16\x64\x65\x66ineSqlGraphElements\x12\xa1\x01\n*get_query_function_execution_signal_stream\x18\x07
 
\x01(\x0b\x32\x44.spark.connect.PipelineCommand.GetQueryFunctionExecutionSignalStreamH\x00R%getQueryFunctionExecutionSignalStream\x12\x88\x01\n!define_flow_query_function_result\x18\x08
 
\x01(\x0b\x32<.spark.connect.PipelineCommand.DefineFlowQueryFunctionResultH\x00R\x1d\x64\x65\x66ineFlowQueryFunctionResult\x12\x65\n\x14\x65xecute_output_flows\x18\t
 
\x01(\x0b\x32\x31.spark.connect.PipelineCommand.ExecuteOutputFlowsH\x00R\x12\x65xecuteOutputFlows\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x1a\xb4\x02\n\x13\x43reateDataflowGraph\x12,\n\x0f\x64\x65\x66\x61ult_catalog\x18\x01
 
\x01(\tH\x00R\x0e\x64\x65\x66\x61ultCatalog\x88\x01\x01\x12.\n\x10\x64\x65\x66\x61ult_database\x18\x02
 
\x01(\tH\x01R\x0f\x64\x65\x66\x61ultDatabase\x88\x01\x01\x12Z\n\x08sql_conf\x18\x05
 \x03(\x0b\x32?.spark.connect.P
 
ipelineCommand.CreateDataflowGraph.SqlConfEntryR\x07sqlConf\x1a:\n\x0cSqlConfEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\x12\n\x10_default_catalogB\x13\n\x11_default_database\x1aZ\n\x11\x44ropDataflowGraph\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_id\x1a\x92\n\n\x0c\x44\x65\x66ineOutput\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x01R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12$\n\x0boutput_name\x18\x02
 \x01(\tH\x02R\noutputName\x88\x01\x01\x12?\n\x0boutput_type\x18\x03 
\x01(\x0e\x32\x19.spark.connect.OutputTypeH\x03R\noutputType\x88\x01\x01\x12\x1d\n\x07\x63omment\x18\x04
 \x01(\tH\x04R\x07\x63omment\x88\x01\x01\x12X\n\x14source_code_location\x18\x05 
\x01(\x0b\x32!.spark.connect.SourceCodeLocationH\x05R\x12sourceCodeLocation\x88\x01\x01\x12_\n\rtable_details\x18\x06
 \x01(\x0b\x32\x38.spark.connect.PipelineCommand.DefineOutput.TableDetai
 lsH\x00R\x0ctableDetails\x12\\\n\x0csink_details\x18\x07 
\x01(\x0b\x32\x37.spark.connect.PipelineCommand.DefineOutput.SinkDetailsH\x00R\x0bsinkDetails\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x1a\xc0\x03\n\x0cTableDetails\x12x\n\x10table_properties\x18\x01
 
\x03(\x0b\x32M.spark.connect.PipelineCommand.DefineOutput.TableDetails.TablePropertiesEntryR\x0ftableProperties\x12%\n\x0epartition_cols\x18\x02
 \x03(\tR\rpartitionCols\x12\x1b\n\x06\x66ormat\x18\x03 
\x01(\tH\x01R\x06\x66ormat\x88\x01\x01\x12\x43\n\x10schema_data_type\x18\x04 
\x01(\x0b\x32\x17.spark.connect.DataTypeH\x00R\x0eschemaDataType\x12%\n\rschema_string\x18\x05
 \x01(\tH\x00R\x0cschemaString\x12-\n\x12\x63lustering_columns\x18\x06 
\x03(\tR\x11\x63lusteringColumns\x1a\x42\n\x14TablePropertiesEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\x08\n\x06schemaB\t\n\x07_format\x1a\xd1\x01\n\x0bSinkDetails\x12^\n\x07options\x1
 8\x01 
\x03(\x0b\x32\x44.spark.connect.PipelineCommand.DefineOutput.SinkDetails.OptionsEntryR\x07options\x12\x1b\n\x06\x66ormat\x18\x02
 
\x01(\tH\x00R\x06\x66ormat\x88\x01\x01\x1a:\n\x0cOptionsEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x42\t\n\x07_formatB\t\n\x07\x64\x65tailsB\x14\n\x12_dataflow_graph_idB\x0e\n\x0c_output_nameB\x0e\n\x0c_output_typeB\n\n\x08_commentB\x17\n\x15_source_code_location\x1a\x96\x12\n\nDefineFlow\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 \x01(\tH\x01R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12 \n\tflow_name\x18\x02 
\x01(\tH\x02R\x08\x66lowName\x88\x01\x01\x12\x33\n\x13target_dataset_name\x18\x03
 \x01(\tH\x03R\x11targetDatasetName\x88\x01\x01\x12Q\n\x08sql_conf\x18\x04 
\x03(\x0b\x32\x36.spark.connect.PipelineCommand.DefineFlow.SqlConfEntryR\x07sqlConf\x12
 \n\tclient_id\x18\x05 
\x01(\tH\x04R\x08\x63lientId\x88\x01\x01\x12X\n\x14source_code_location\x18\x06 
\x01(\x0b\x32!.spark.connect.SourceCodeLocati
 
onH\x05R\x12sourceCodeLocation\x88\x01\x01\x12x\n\x15relation_flow_details\x18\x07
 
\x01(\x0b\x32\x42.spark.connect.PipelineCommand.DefineFlow.WriteRelationFlowDetailsH\x00R\x13relationFlowDetails\x12q\n\x15\x61uto_cdc_flow_details\x18\n
 
\x01(\x0b\x32<.spark.connect.PipelineCommand.DefineFlow.AutoCdcFlowDetailsH\x00R\x12\x61utoCdcFlowDetails\x12\x35\n\textension\x18\xe7\x07
 
\x01(\x0b\x32\x14.google.protobuf.AnyH\x00R\textension\x12\x17\n\x04once\x18\x08
 
\x01(\x08H\x06R\x04once\x88\x01\x01\x1a:\n\x0cSqlConfEntry\x12\x10\n\x03key\x18\x01
 \x01(\tR\x03key\x12\x14\n\x05value\x18\x02 
\x01(\tR\x05value:\x02\x38\x01\x1a\x61\n\x18WriteRelationFlowDetails\x12\x38\n\x08relation\x18\x01
 
\x01(\x0b\x32\x17.spark.connect.RelationH\x00R\x08relation\x88\x01\x01\x42\x0b\n\t_relation\x1a\xdc\t\n\x12\x41utoCdcFlowDetails\x12\x1b\n\x06source\x18\x01
 \x01(\tH\x00R\x06source\x88\x01\x01\x12-\n\x04keys\x18\x02 
\x03(\x0b\x32\x19.spark.connect.ExpressionR\x04keys\x12?\n\x0bsequence_by\x18\x03
 \x01(\x0b\x32\x1
 
9.spark.connect.ExpressionH\x01R\nsequenceBy\x88\x01\x01\x12\x34\n\x05where\x18\x04
 
\x01(\x0b\x32\x19.spark.connect.ExpressionH\x02R\x05where\x88\x01\x01\x12\x33\n\x13ignore_null_updates\x18\x05
 
\x01(\x08H\x03R\x11ignoreNullUpdates\x88\x01\x01\x12H\n\x10\x61pply_as_deletes\x18\x06
 
\x01(\x0b\x32\x19.spark.connect.ExpressionH\x04R\x0e\x61pplyAsDeletes\x88\x01\x01\x12L\n\x12\x61pply_as_truncates\x18\x07
 
\x01(\x0b\x32\x19.spark.connect.ExpressionH\x05R\x10\x61pplyAsTruncates\x88\x01\x01\x12:\n\x0b\x63olumn_list\x18\x08
 
\x03(\x0b\x32\x19.spark.connect.ExpressionR\ncolumnList\x12G\n\x12\x65xcept_column_list\x18\t
 
\x03(\x0b\x32\x19.spark.connect.ExpressionR\x10\x65xceptColumnList\x12^\n\x12stored_as_scd_type\x18\n
 
\x01(\x0e\x32\x31.spark.connect.PipelineCommand.DefineFlow.SCDTypeR\x0fstoredAsScdType\x12T\n\x19track_history_column_list\x18\x0b
 
\x03(\x0b\x32\x19.spark.connect.ExpressionR\x16trackHistoryColumnList\x12\x61\n 
track_history_except_column_list\x18\x0c \x03(\x0b\x32\x19.spark.conn
 
ect.ExpressionR\x1ctrackHistoryExceptColumnList\x12_\n\x1fignore_null_updates_column_list\x18\x0e
 
\x03(\x0b\x32\x19.spark.connect.ExpressionR\x1bignoreNullUpdatesColumnList\x12l\n&ignore_null_updates_except_column_list\x18\x0f
 
\x03(\x0b\x32\x19.spark.connect.ExpressionR!ignoreNullUpdatesExceptColumnList\x12J\n\x11\x63olumns_to_update\x18\x10
 
\x01(\x0b\x32\x19.spark.connect.ExpressionH\x06R\x0f\x63olumnsToUpdate\x88\x01\x01\x42\t\n\x07_sourceB\x0e\n\x0c_sequence_byB\x08\n\x06_whereB\x16\n\x14_ignore_null_updatesB\x13\n\x11_apply_as_deletesB\x15\n\x13_apply_as_truncatesB\x14\n\x12_columns_to_update\x1a:\n\x08Response\x12
 \n\tflow_name\x18\x01 
\x01(\tH\x00R\x08\x66lowName\x88\x01\x01\x42\x0c\n\n_flow_name"C\n\x07SCDType\x12\x18\n\x14SCD_TYPE_UNSPECIFIED\x10\x00\x12\x0e\n\nSCD_TYPE_1\x10\x01\x12\x0e\n\nSCD_TYPE_2\x10\x02\x42\t\n\x07\x64\x65tailsB\x14\n\x12_dataflow_graph_idB\x0c\n\n_flow_nameB\x16\n\x14_target_dataset_nameB\x0c\n\n_client_idB\x17\n\x15_source_code_locationB\x07\n\x05_on
 ce\x1a\xe4\x02\n\x12\x45xecuteOutputFlows\x12U\n\rdefine_output\x18\x01 
\x01(\x0b\x32+.spark.connect.PipelineCommand.DefineOutputH\x00R\x0c\x64\x65\x66ineOutput\x88\x01\x01\x12L\n\x0c\x64\x65\x66ine_flows\x18\x02
 
\x03(\x0b\x32).spark.connect.PipelineCommand.DefineFlowR\x0b\x64\x65\x66ineFlows\x12&\n\x0c\x66ull_refresh\x18\x03
 \x01(\x08H\x01R\x0b\x66ullRefresh\x88\x01\x01\x12\x1d\n\x07storage\x18\x04 
\x01(\tH\x02R\x07storage\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07 
\x03(\x0b\x32\x14.google.protobuf.AnyR\textensionB\x10\n\x0e_define_outputB\x0f\n\r_full_refreshB\n\n\x08_storage\x1a\xc2\x02\n\x08StartRun\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12\x34\n\x16\x66ull_refresh_selection\x18\x02
 \x03(\tR\x14\x66ullRefreshSelection\x12-\n\x10\x66ull_refresh_all\x18\x03 
\x01(\x08H\x01R\x0e\x66ullRefreshAll\x88\x01\x01\x12+\n\x11refresh_selection\x18\x04
 \x03(\tR\x10refreshSelection\x12\x15\n\x03\x64ry\x18\x05 
\x01(\x08H\x02R\x03\x64ry\x8
 8\x01\x01\x12\x1d\n\x07storage\x18\x06 
\x01(\tH\x03R\x07storage\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x13\n\x11_full_refresh_allB\x06\n\x04_dryB\n\n\x08_storage\x1a\xc7\x01\n\x16\x44\x65\x66ineSqlGraphElements\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12\'\n\rsql_file_path\x18\x02
 \x01(\tH\x01R\x0bsqlFilePath\x88\x01\x01\x12\x1e\n\x08sql_text\x18\x03 
\x01(\tH\x02R\x07sqlText\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x10\n\x0e_sql_file_pathB\x0b\n\t_sql_text\x1a\x9e\x01\n%GetQueryFunctionExecutionSignalStream\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 \x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12 \n\tclient_id\x18\x02 
\x01(\tH\x01R\x08\x63lientId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_idB\x0c\n\n_client_id\x1a\xc6\x02\n\x1d\x44\x65\x66ineFlowQueryFunctionResult\x12$\n\tflow_name\x18\x01
 
\x01(\tB\x02\x18\x01H\x00R\x08\x66lowName\x88\x01\x01\x12O\n\x0f\x66low_identifier\x18\x04
 \x01(\x0b\x32!.spark.connect.Resolved
 
IdentifierH\x01R\x0e\x66lowIdentifier\x88\x01\x01\x12/\n\x11\x64\x61taflow_graph_id\x18\x02
 
\x01(\tH\x02R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12\x38\n\x08relation\x18\x03
 
\x01(\x0b\x32\x17.spark.connect.RelationH\x03R\x08relation\x88\x01\x01\x42\x0c\n\n_flow_nameB\x12\n\x10_flow_identifierB\x14\n\x12_dataflow_graph_idB\x0b\n\t_relationB\x0e\n\x0c\x63ommand_type"\xf0\x05\n\x15PipelineCommandResult\x12\x81\x01\n\x1c\x63reate_dataflow_graph_result\x18\x01
 
\x01(\x0b\x32>.spark.connect.PipelineCommandResult.CreateDataflowGraphResultH\x00R\x19\x63reateDataflowGraphResult\x12k\n\x14\x64\x65\x66ine_output_result\x18\x02
 
\x01(\x0b\x32\x37.spark.connect.PipelineCommandResult.DefineOutputResultH\x00R\x12\x64\x65\x66ineOutputResult\x12\x65\n\x12\x64\x65\x66ine_flow_result\x18\x03
 
\x01(\x0b\x32\x35.spark.connect.PipelineCommandResult.DefineFlowResultH\x00R\x10\x64\x65\x66ineFlowResult\x1a\x62\n\x19\x43reateDataflowGraphResult\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 \x01(\tH\x00R\x0f\x64\x61t
 
aflowGraphId\x88\x01\x01\x42\x14\n\x12_dataflow_graph_id\x1a\x85\x01\n\x12\x44\x65\x66ineOutputResult\x12W\n\x13resolved_identifier\x18\x01
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x00R\x12resolvedIdentifier\x88\x01\x01\x42\x16\n\x14_resolved_identifier\x1a\x83\x01\n\x10\x44\x65\x66ineFlowResult\x12W\n\x13resolved_identifier\x18\x01
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x00R\x12resolvedIdentifier\x88\x01\x01\x42\x16\n\x14_resolved_identifierB\r\n\x0bresult_type"I\n\x13PipelineEventResult\x12\x32\n\x05\x65vent\x18\x01
 
\x01(\x0b\x32\x1c.spark.connect.PipelineEventR\x05\x65vent"t\n\rPipelineEvent\x12\x38\n\ttimestamp\x18\x01
 
\x01(\x0b\x32\x1a.google.protobuf.TimestampR\ttimestamp\x12\x1d\n\x07message\x18\x02
 
\x01(\tH\x00R\x07message\x88\x01\x01\x42\n\n\x08_message"\xf1\x01\n\x12SourceCodeLocation\x12
 \n\tfile_name\x18\x01 
\x01(\tH\x00R\x08\x66ileName\x88\x01\x01\x12$\n\x0bline_number\x18\x02 
\x01(\x05H\x01R\nlineNumber\x88\x01\x01\x12,\n\x0f\x64\x65\x66inition_path\x
 18\x03 
\x01(\tH\x02R\x0e\x64\x65\x66initionPath\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07
 
\x03(\x0b\x32\x14.google.protobuf.AnyR\textensionB\x0c\n\n_file_nameB\x0e\n\x0c_line_numberB\x12\n\x10_definition_path"\x97\x01\n$PipelineQueryFunctionExecutionSignal\x12!\n\nflow_names\x18\x01
 \x03(\tB\x02\x18\x01R\tflowNames\x12L\n\x10\x66low_identifiers\x18\x02 
\x03(\x0b\x32!.spark.connect.ResolvedIdentifierR\x0f\x66lowIdentifiers"\xf0\x02\n\x17PipelineAnalysisContext\x12/\n\x11\x64\x61taflow_graph_id\x18\x01
 
\x01(\tH\x00R\x0f\x64\x61taflowGraphId\x88\x01\x01\x12,\n\x0f\x64\x65\x66inition_path\x18\x02
 \x01(\tH\x01R\x0e\x64\x65\x66initionPath\x88\x01\x01\x12$\n\tflow_name\x18\x03 
\x01(\tB\x02\x18\x01H\x02R\x08\x66lowName\x88\x01\x01\x12O\n\x0f\x66low_identifier\x18\x04
 
\x01(\x0b\x32!.spark.connect.ResolvedIdentifierH\x03R\x0e\x66lowIdentifier\x88\x01\x01\x12\x33\n\textension\x18\xe7\x07
 
\x03(\x0b\x32\x14.google.protobuf.AnyR\textensionB\x14\n\x12_dataflow_graph_idB\x12\n\x10_definition_pat
 
hB\x0c\n\n_flow_nameB\x12\n\x10_flow_identifier*i\n\nOutputType\x12\x1b\n\x17OUTPUT_TYPE_UNSPECIFIED\x10\x00\x12\x15\n\x11MATERIALIZED_VIEW\x10\x01\x12\t\n\x05TABLE\x10\x02\x12\x12\n\x0eTEMPORARY_VIEW\x10\x03\x12\x08\n\x04SINK\x10\x04\x42\x36\n\x1eorg.apache.spark.connect.protoP\x01Z\x12internal/generatedb\x06proto3'

Review Comment:
   Is there a particular python version you generated the stubs with? IIRC stub 
generation can be python verison dependent.
   
   Just as a sanity check, if you were to regenerate these stubs using the same 
python environment on master, is it generating a no-diff stub?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-56650][SDP][CONNECT] Add AutoCDC Spark Connect APIs [spark]

Reply via email to