cloud-fan commented on code in PR #42174:
URL: https://github.com/apache/spark/pull/42174#discussion_r1280631670
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/FunctionTableSubqueryArgumentExpression.scala:
##########
@@ -32,11 +32,40 @@ import org.apache.spark.sql.types.DataType
* table in the catalog. In the latter case, the relation argument comprises
* a table subquery that may itself refer to one or more tables in its own
* FROM clause.
+ *
+ * Each TABLE argument may also optionally include a PARTITION BY clause. If
present, these indicate
+ * how to logically split up the input relation such that the table-valued
function evaluates
+ * exactly once for each partition, and returns the union of all results. If
no partitioning list is
+ * present, this splitting of the input relation is undefined. Furthermore, if
the PARTITION BY
+ * clause includes a following ORDER BY clause, Catalyst will sort the rows in
each partition such
+ * that the table-valued function receives them one-by-one in the requested
order. Otherwise, if no
+ * such ordering is specified, the ordering of rows within each partition is
undefined.
+ *
+ * @param plan the logical plan provided as input for the table argument as
either a logical
+ * relation or as a more complex logical plan in the event of a
table subquery.
+ * @param outerAttrs outer references of this subquery plan, generally empty
since these table
+ * arguments do not allow correlated references currently
+ * @param exprId expression ID of this subquery expression, generally
generated afresh each time
+ * @param partitionByExpressions if non-empty, the TABLE argument included the
PARTITION BY clause
+ * to indicate that the input relation should be
repartitioned by the
+ * hash of the provided expressions, such that
all the rows with each
+ * unique combination of values of the
partitioning expressions will
+ * be consumed by exactly one instance of the
table function class.
+ * @param withSinglePartition if true, the TABLE argument included the WITH
SINGLE PARTITION clause
+ * to indicate that the entire input relation
should be repartitioned to
+ * one worker for consumption by exactly one
instance of the table
+ * function class.
+ * @param orderByExpressions if non-empty, the TABLE argument included the
ORDER BY clause to
+ * indicate that the rows within each partition of
the table function are
+ * to arrive in the provided order.
*/
case class FunctionTableSubqueryArgumentExpression(
plan: LogicalPlan,
outerAttrs: Seq[Expression] = Seq.empty,
- exprId: ExprId = NamedExpression.newExprId)
+ exprId: ExprId = NamedExpression.newExprId,
+ partitionByExpressions: Seq[Expression] = Seq.empty,
+ withSinglePartition: Boolean = false,
+ orderByExpressions: Seq[SortOrder] = Seq.empty)
extends SubqueryExpression(plan, outerAttrs, exprId, Seq.empty, None) with
Unevaluable {
Review Comment:
shall we add an assert here that `partitionByExpressions` must be Nil if
`withSinglePartition` is true?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]