[GitHub] [spark] cloud-fan commented on a diff in pull request #37211: [SPARK-39644][SQL] Add RangePartitioning reporting for V2 DataSources

GitBox Wed, 17 Aug 2022 22:31:21 -0700


cloud-fan commented on code in PR #37211:
URL: https://github.com/apache/spark/pull/37211#discussion_r948663939



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Relation.scala:
##########
@@ -119,13 +119,16 @@ case class DataSourceV2Relation(
  * @param output the output attributes of this relation
  * @param keyGroupedPartitioning if set, the partitioning expressions that are 
used to split the
  *                               rows in the scan across different partitions
- * @param ordering if set, the ordering provided by the scan
+ * @param rangePartitioning if set, the range partitioning expressions that 
are used to split the
+ *                               rows in the scan across different partitions
+ * @param ordering if set, the in-partition ordering provided by the scan
  */
 case class DataSourceV2ScanRelation(
     relation: DataSourceV2Relation,
     scan: Scan,
     output: Seq[AttributeReference],
     keyGroupedPartitioning: Option[Seq[Expression]] = None,
+    rangePartitioning: Option[Seq[SortOrder]] = None,

Review Comment:
   we need to clearly define the semantics here, as there are two sort orders 
now. More specially, what if a scan reports both `RangePartitioning` and 
ordering? There are a few options:
   1. require the reported `RangePartitioning` and ordering to be compatible
   2. let `RangePartitioning` only define cross-partitions ordering, while 
`SupportsReportOrdering` is for data ordering within each partition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] cloud-fan commented on a diff in pull request #37211: [SPARK-39644][SQL] Add RangePartitioning reporting for V2 DataSources

Reply via email to