[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/13837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r87733959 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala --- @@ -322,6 +323,9 @@ case class DataSource( val equality = sparkSession.sessionState.conf.resolver StructType(schema.filterNot(f => partitionColumns.exists(equality(_, f.name }.orElse { + if (allPaths.isEmpty && !format.isInstanceOf[TextFileFormat]) { --- End diff -- Hi @gatorsmile, would this be better if we explain here text data source is excluded because text datasource always uses a schema consisting of a string field if the schema is not explicitly given? BTW, should we maybe change `text.TextFileFormat` to `TextFileFormat ` https://github.com/apache/spark/pull/13837/files#diff-7a6cb188d2ae31eb3347b5629a679cecR139 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r87728887 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -40,7 +40,7 @@ private[sql] class ParquetOptions( if (!shortParquetCompressionCodecNames.contains(codecName)) { val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) throw new IllegalArgumentException(s"Codec [$codecName] " + -s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") +s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") --- End diff -- `Available` was intentionally used because Parquet only supports snappy, gzip or lzo whereas text-based supports compression codecs including other codecs but that lists the known ones. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r87723510 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -2684,8 +2684,7 @@ test_that("Call DataFrameWriter.load() API in Java without path and check argume # It makes sure that we can omit path argument in read.df API and then it calls # DataFrameWriter.load() without path. expect_error(read.df(source = "json"), - paste("Error in loadDF : analysis error - Unable to infer schema for JSON at .", - "It must be specified manually")) + paste("Error in loadDF : illegal argument - 'path' is not specified")) --- End diff -- I recall this test is intentionally testing without path argument? cc @HyukjinKwon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
GitHub user gatorsmile reopened a pull request: https://github.com/apache/spark/pull/13837 [SPARK-16126] [SQL] Better Error Message When using DataFrameReader without `path` What changes were proposed in this pull request? When users do not specify the path in `DataFrameReader` APIs, it can get a confusing error message. For example, ``` Scala spark.read.json() ``` Error message: ``` Unable to infer schema for JSON at . It must be specified manually; ``` After the fix, the error message will be like: ``` 'path' is not specified ``` Another major goal of this PR is to add test cases for the latest changes in https://github.com/apache/spark/pull/13727. - orc read APIs - illegal format name - save API - empty path or illegal path - load API - empty path - illegal compression - fixed a test case in the existing test case `prevent all column partitioning` How was this patch tested? Test cases are added. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark dfWriterAudit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13837.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13837 commit 8d021e47e9a4e95ade99d617c77ef1e17245a796 Author: gatorsmileDate: 2016-06-17T18:24:42Z test cases commit 5e4a3c666dfb767215130df1a778e5f97d438c54 Author: gatorsmile Date: 2016-06-17T19:58:56Z add test cases. commit 26437151ff0db4c0010510de047f81b1808890f4 Author: gatorsmile Date: 2016-06-17T23:48:23Z fix and test cases commit cfc0188a0baa45aef1bae6604dd10450eaafd561 Author: gatorsmile Date: 2016-06-21T01:59:02Z Merge remote-tracking branch 'upstream/master' into dfWriterAudit commit 3007fe66d03a6a40dc530c13d44c27030118a8a4 Author: gatorsmile Date: 2016-06-21T13:27:16Z more test case commit a1ae7249322c17ea09be4e968535dc115b2acb64 Author: gatorsmile Date: 2016-06-22T06:12:56Z fix test case commit 635046a10cc059a6ae8756fb7bc7167f5621255c Author: gatorsmile Date: 2016-06-22T16:04:51Z fix test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/13837 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r68341391 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -40,7 +40,7 @@ private[sql] class ParquetOptions( if (!shortParquetCompressionCodecNames.contains(codecName)) { val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) throw new IllegalArgumentException(s"Codec [$codecName] " + -s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") +s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") --- End diff -- Just to make it consistent with the output of the other cases. See the code: https://github.com/apache/spark/blob/d6dc12ef0146ae409834c78737c116050961f350/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/CompressionCodecs.scala#L49-L51 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #13837: [SPARK-16126] [SQL] Better Error Message When usi...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/13837#discussion_r68324497 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -40,7 +40,7 @@ private[sql] class ParquetOptions( if (!shortParquetCompressionCodecNames.contains(codecName)) { val availableCodecs = shortParquetCompressionCodecNames.keys.map(_.toLowerCase) throw new IllegalArgumentException(s"Codec [$codecName] " + -s"is not available. Available codecs are ${availableCodecs.mkString(", ")}.") +s"is not available. Known codecs are ${availableCodecs.mkString(", ")}.") --- End diff -- why this change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org