[GitHub] [spark] cloud-fan commented on a diff in pull request #42174: [SPARK-44503][SQL] Add analysis and planning for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls

via GitHub Tue, 01 Aug 2023 06:18:48 -0700


cloud-fan commented on code in PR #42174:
URL: https://github.com/apache/spark/pull/42174#discussion_r1280631670



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala:
##########
@@ -32,11 +32,40 @@ import org.apache.spark.sql.types.DataType
  * table in the catalog. In the latter case, the relation argument comprises
  * a table subquery that may itself refer to one or more tables in its own
  * FROM clause.
+ *
+ * Each TABLE argument may also optionally include a PARTITION BY clause. If 
present, these indicate
+ * how to logically split up the input relation such that the table-valued 
function evaluates
+ * exactly once for each partition, and returns the union of all results. If 
no partitioning list is
+ * present, this splitting of the input relation is undefined. Furthermore, if 
the PARTITION BY
+ * clause includes a following ORDER BY clause, Catalyst will sort the rows in 
each partition such
+ * that the table-valued function receives them one-by-one in the requested 
order. Otherwise, if no
+ * such ordering is specified, the ordering of rows within each partition is 
undefined.
+ *
+ * @param plan the logical plan provided as input for the table argument as 
either a logical
+ *             relation or as a more complex logical plan in the event of a 
table subquery.
+ * @param outerAttrs outer references of this subquery plan, generally empty 
since these table
+ *                   arguments do not allow correlated references currently
+ * @param exprId expression ID of this subquery expression, generally 
generated afresh each time
+ * @param partitionByExpressions if non-empty, the TABLE argument included the 
PARTITION BY clause
+ *                               to indicate that the input relation should be 
repartitioned by the
+ *                               hash of the provided expressions, such that 
all the rows with each
+ *                               unique combination of values of the 
partitioning expressions will
+ *                               be consumed by exactly one instance of the 
table function class.
+ * @param withSinglePartition if true, the TABLE argument included the WITH 
SINGLE PARTITION clause
+ *                            to indicate that the entire input relation 
should be repartitioned to
+ *                            one worker for consumption by exactly one 
instance of the table
+ *                            function class.
+ * @param orderByExpressions if non-empty, the TABLE argument included the 
ORDER BY clause to
+ *                           indicate that the rows within each partition of 
the table function are
+ *                           to arrive in the provided order.
  */
 case class FunctionTableSubqueryArgumentExpression(
     plan: LogicalPlan,
     outerAttrs: Seq[Expression] = Seq.empty,
-    exprId: ExprId = NamedExpression.newExprId)
+    exprId: ExprId = NamedExpression.newExprId,
+    partitionByExpressions: Seq[Expression] = Seq.empty,
+    withSinglePartition: Boolean = false,
+    orderByExpressions: Seq[SortOrder] = Seq.empty)
   extends SubqueryExpression(plan, outerAttrs, exprId, Seq.empty, None) with 
Unevaluable {
 

Review Comment:
   shall we add an assert here that `partitionByExpressions` must be Nil if 
`withSinglePartition` is true?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #42174: [SPARK-44503][SQL] Add analysis and planning for PARTITION BY and ORDER BY clause after TABLE arguments for TVF calls

Reply via email to