sunchao commented on a change in pull request #35512:
URL: https://github.com/apache/spark/pull/35512#discussion_r806123708
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -101,6 +101,14 @@ case class ClusteredDistribution(
* Since this distribution relies on [[HashPartitioning]] on the physical
partitioning of the
* stateful operator, only [[HashPartitioning]] (and HashPartitioning in
* [[PartitioningCollection]]) can satisfy this distribution.
+ *
+ * NOTE: This is applied only stream-stream join as of now. For other stateful
operators, we have
Review comment:
"applied only to stream-stream join"?
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -101,6 +101,14 @@ case class ClusteredDistribution(
* Since this distribution relies on [[HashPartitioning]] on the physical
partitioning of the
* stateful operator, only [[HashPartitioning]] (and HashPartitioning in
* [[PartitioningCollection]]) can satisfy this distribution.
+ *
+ * NOTE: This is applied only stream-stream join as of now. For other stateful
operators, we have
+ * been using ClusteredDistribution, which could construct the physical
partitioning of the state
+ * in different way. (ClusteredDistribution requires relaxed condition and
multiple
+ * partitionings can satisfy the requirement.) We need to construct the way to
fix this with
+ * minimizing possibility to break the existing checkpoints.
+ *
+ * TODO: SPARK-38204 to address above note.
Review comment:
nit nit: I saw we usually use the pattern `TODO(SPARK-38204)`
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/physical/partitioning.scala
##########
@@ -101,6 +101,14 @@ case class ClusteredDistribution(
* Since this distribution relies on [[HashPartitioning]] on the physical
partitioning of the
* stateful operator, only [[HashPartitioning]] (and HashPartitioning in
* [[PartitioningCollection]]) can satisfy this distribution.
+ *
+ * NOTE: This is applied only stream-stream join as of now. For other stateful
operators, we have
+ * been using ClusteredDistribution, which could construct the physical
partitioning of the state
+ * in different way. (ClusteredDistribution requires relaxed condition and
multiple
Review comment:
maybe "in different way (ClusteredDistribution requires) ...": no dot
after "way".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]