[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

via GitHub Wed, 03 May 2023 16:52:59 -0700


zhenlineo commented on code in PR #40796:
URL: https://github.com/apache/spark/pull/40796#discussion_r1184408288



##########
connector/connect/server/src/main/scala/org/apache/spark/sql/connect/planner/SparkConnectPlanner.scala:
##########
@@ -664,7 +665,53 @@ class SparkConnectPlanner(val session: SparkSession) {
         input: proto.Relation,
         groupingExprs: java.util.List[proto.Expression],
         sortingExprs: java.util.List[proto.Expression]): 
UntypedKeyValueGroupedDataset = {
-      val logicalPlan = transformRelation(input)
+      apply(transformRelation(input), groupingExprs, sortingExprs)
+    }
+
+    private def apply(
+        logicalPlan: LogicalPlan,
+        groupingExprs: java.util.List[proto.Expression],
+        sortingExprs: java.util.List[proto.Expression]): 
UntypedKeyValueGroupedDataset = {
+      if (groupingExprs.size() == 1) {
+        createFromGroupByKeyFunc(logicalPlan, groupingExprs, sortingExprs)
+      } else if (groupingExprs.size() > 1) {

Review Comment:
   I do not see a common path here. The nasty part of this is we hide a logic 
inside the grouping_exprs using the count of the expressions. The alternative I 
can think is a UnresolvedFunc or new Expression which allow us to add more 
logics. e.g.
   ```
   message KeyValueGroupedDataset { // New Expression or Unresolved Func
    // (Required) Input user-defined function. Defines the grouping func
    CommonInlineUserDefinedFunction grouping_func = 1;
   
    // (Optional) Extra grouping expressions needed for RelationalGroupedDataset
    repeat Expression grouping_expressions = 2;
   }
   ``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] zhenlineo commented on a diff in pull request #40796: [SPARK-43223][Connect] Typed agg, reduce functions, RelationalGroupedDataset#as

Reply via email to