amaliujia commented on code in PR #38793:
URL: https://github.com/apache/spark/pull/38793#discussion_r1031921894


##########
connector/connect/src/main/protobuf/spark/connect/relations.proto:
##########
@@ -457,3 +458,16 @@ message RenameColumnsByNameToNameMap {
   // duplicated B are not allowed.
   map<string, string> rename_columns_map = 2;
 }
+
+// Adding columns or replacing the existing columns that has the same names.
+message WithColumns {
+  // (Required) The input relation.
+  Relation input = 1;
+
+  // (Required)
+  //
+  // Given a column name, apply corresponding expression on the column. If 
column
+  // name exists in the input relation, then replacing the column. if column 
name
+  // does not exist in the input relation, then adding the column.
+  map<string, Expression> cols_map = 2;

Review Comment:
   This is an interesting topic. Given current withColumns API design, users 
cannot maintain or control the order over schema fields already (please correct 
me if I am wrong).
   
   The nice thing to have is, if a user call withColumns by same parameter on 
the same DataFrame twice, the user see the same output schema through this 
proto (ordering not predictable but at least consistent)
   
   However I don't know if this is feasible. For Python/Scala, we offer API 
like Dict/Map which once is used, the value iteration won't be deterministic so 
we cannot produce stable ordering on clients side already thus cannot preserve 
it through proto. Is this true? 
   
   If we ever can stability produce an ordering on clients, we can of course 
maintain the ordering through proto. 
   
   cc @cloud-fan @HyukjinKwon 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to