amaliujia commented on code in PR #38193:
URL: https://github.com/apache/spark/pull/38193#discussion_r992889787


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -31,7 +31,7 @@ option java_package = "org.apache.spark.connect.proto";
 message Relation {
   RelationCommon common = 1;
   oneof rel_type {
-    Read read = 2;
+    UnresolvedRelation unresolved_relation = 2;

Review Comment:
   I was thinking something similar: 
   
   It is a wrong pattern that a message contains many fields while a subset of 
the fields that belong to one type and others that belong to different types, 
which is option 1 that Wenchen mentioned.
   
   It's good that we keep all relevant fields into one message. If there is 
more than one message but we want to co-locate it then there is a type to 
differentiate those, which is Wenchen's option 2 and Martin's example, if I am 
not mis-read it.
   
   I think we are sort of having the consensus on the rule of when to split 
messages. 
   
   
   However the rule of when to co-locate messages here is still not clear on, 
for example, why we don't choose to define
   ```
   message ExecuteWithSingleInput {
    oneof node_type {
        Project,
        Filter
    } 
    
    message Project {
    
    }
    
    message Filter {
   
    }
   }
   ```
   
   Basically the rule to split messages is clear, but the rule to co-locate 
message is not. We can co-locate `Project` and `Filter` as they both have 
single relation input. But we have chosen not to in the proto. The naming in my 
example is bad and I am pretty we can have a better verb to describe it.
   
   To me the sharing semantic of a `unresolved_relation` and `data_source` is 
even less than `project` and `filter`. In Apache Calcite, it can merge 
`project` and `filter` and call it a `Calc`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to