zedtang commented on code in PR #47301: URL: https://github.com/apache/spark/pull/47301#discussion_r1686736484
########## connector/connect/client/jvm/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala: ########## @@ -201,6 +201,22 @@ final class DataFrameWriter[T] private[sql] (ds: Dataset[T]) { this } + /** + * Clusters the output by the given columns on the file system. The rows with matching values in Review Comment: sure, updated here and below ########## sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala: ########## @@ -209,10 +209,25 @@ object ClusterBySpec { normalizeClusterBySpec(schema, clusterBySpec, resolver).toJson } + /** + * Converts a ClusterBySpec to a map of table properties used to store the clustering + * information in the table catalog. + * + * @param clusterBySpec : existing ClusterBySpec to be converted to properties. + */ + def toProperties(clusterBySpec: ClusterBySpec): Map[String, String] = { Review Comment: Besides the different return type, `toProperty` additionally does validation of the clustering columns against table schema, therefore it has 2 more input parameters (`schema` and `resovler`). I updated the comments. ########## sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala: ########## @@ -104,9 +106,27 @@ final class DataFrameWriterV2[T] private[sql](table: String, ds: Dataset[T]) } this.partitioning = Some(asTransforms) + validatePartitioning() + this + } + + @scala.annotation.varargs + override def clusterBy(colName: String, colNames: String*): CreateTableWriter[T] = { + this.clustering = + Some(ClusterByTransform((colName +: colNames).map(col => FieldReference(col)))) + validatePartitioning() this } + /** + * Validate that clusterBy is not used with partitionBy. + */ + private def validatePartitioning(): Unit = { + if (partitioning.nonEmpty && clustering.nonEmpty) { Review Comment: For DFW V2, [bucketBy](https://github.com/apache/spark/blob/3af2ea973c458b9bd9818d6af733683fb15fbc19/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriterV2.scala#L98) is implemented as a special partitionedBy transform, therefore I didn't call out here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org