[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/21235 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186606160 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") --- End diff -- how about minimizing the code changes ``` private def assertNotBucketed(operation: String): Unit = { if (getBucketSpec.isDefined) { throw new AnalysisException(s"'$operation' does not support bucketing right now") } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186532502 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") --- End diff -- I agree with you. I also found in `getBucketSpec`, when `numBuckets.isEmpty && sortColumnNames.isDefined`, it will throw `IllegalArgumentException`. How about alternatively, we throw `AnalysisException` for all the cases for consistency? ```scala private def getBucketSpec: Option[BucketSpec] = { assertNotSortByOrBucketedBy() numBuckets.map { n => BucketSpec(n, bucketColumnNames.get, sortColumnNames.getOrElse(Nil)) } } private def assertNotSortByOrBucketedBy(): Unit = { if (sortColumnNames.isDefined && numBuckets.isEmpty) { throw new AnalysisException("sortBy must be used together with bucketBy") } } private def assertNotBucketedAndNotSorted(operation: String): Unit = { assertNotSortByOrBucketedBy() if (numBuckets.isDefined) { if (sortColumnNames.isDefined) { throw new AnalysisException( s"'$operation' does not support bucketBy and sortBy within a bucket right now") } else { throw new AnalysisException(s"'$operation' does not support bucketBy right now") } } } ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186316448 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") --- End diff -- How about keeping the function name unchanged and just changing this message and list the sort columns if having. Something like: > '$operation' does not support bucketing. Number of buckets: ...; sortBy: ...; --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186241694 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") + private def assertNotBucketedOrSorted(operation: String): Unit = { +(numBuckets.isDefined, sortColumnNames.isDefined) match { + case (true, true) => +throw new AnalysisException( + s"'$operation' does not support bucketing and sorting right now") + case (true, false) => +throw new AnalysisException(s"'$operation' does not support bucketing right now") + case (false, true) => +throw new AnalysisException(s"'$operation' does not support sorting right now") --- End diff -- Just `'$operation' does not support sortBy right now`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186241582 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") + private def assertNotBucketedOrSorted(operation: String): Unit = { +(numBuckets.isDefined, sortColumnNames.isDefined) match { + case (true, true) => +throw new AnalysisException( + s"'$operation' does not support bucketing and sorting right now") --- End diff -- If we want to clearly state it, how about `'$operation' does not support bucketBy and sortBy right now`? So to avoid confusing with general sorting. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186240261 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") + private def assertNotBucketedOrSorted(operation: String): Unit = { +(numBuckets.isDefined, sortColumnNames.isDefined) match { + case (true, true) => +throw new AnalysisException( + s"'$operation' does not support bucketing and sorting right now") + case (true, false) => +throw new AnalysisException(s"'$operation' does not support bucketing right now") + case (false, true) => +throw new AnalysisException(s"'$operation' does not support sorting right now") --- End diff -- I know this is the sorting in each bucket. If a user just calls `writer.sortBy` without calling `bucketBy`, the user will get `s"'$operation' does not support bucketing right now"` which is hard to understand what's going on. For the case of sortBy is enabled, and bucketBy is disabled, how about I change the error message to `sortBy must be used together with bucketBy, and '$operation' does not support bucketBy right now` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/21235#discussion_r186179922 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala --- @@ -339,9 +339,16 @@ final class DataFrameWriter[T] private[sql](ds: Dataset[T]) { } } - private def assertNotBucketed(operation: String): Unit = { -if (numBuckets.isDefined || sortColumnNames.isDefined) { - throw new AnalysisException(s"'$operation' does not support bucketing right now") + private def assertNotBucketedOrSorted(operation: String): Unit = { +(numBuckets.isDefined, sortColumnNames.isDefined) match { + case (true, true) => +throw new AnalysisException( + s"'$operation' does not support bucketing and sorting right now") + case (true, false) => +throw new AnalysisException(s"'$operation' does not support bucketing right now") + case (false, true) => +throw new AnalysisException(s"'$operation' does not support sorting right now") --- End diff -- The sorting is only used to sort data in each bucket. This is different from the general sorting --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21235: [SPARK-24181][SQL] Better error message for writi...
GitHub user dbtsai opened a pull request: https://github.com/apache/spark/pull/21235 [SPARK-24181][SQL] Better error message for writing sorted data ## What changes were proposed in this pull request? The exception message should clearly distinguish sorting and bucketing in `save` and `jdbc` write. When a user tries to write a sorted data using save or insertInto, it will throw an exception with message that `s"'$operation' does not support bucketing right now""`. We should throw `s"'$operation' does not support sorting right now""` instead. ## How was this patch tested? More tests in `DataFrameReaderWriterSuite.scala` You can merge this pull request into a Git repository by running: $ git pull https://github.com/dbtsai/spark fixException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21235.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21235 commit 72efec10871ed15eb1bef8b208f9b08de1191456 Author: DB Tsai Date: 2018-04-27T18:53:29Z The exception message should clearly distinguish sorting and bucketing in save() and jdbc write --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org