[spark] branch master updated: [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new a4cf1a4 [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3 a4cf1a4 is described below commit a4cf1a4f4e1b2707059c8c341e06942246cb83bf Author: Sean Owen AuthorDate: Mon Apr 15 19:18:37 2019 -0700 [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3 ## What changes were proposed in this pull request? Unify commons-beanutils deps to latest 1.9.3. This resolves the version inconsistency in Hadoop 2.7's build and also picks up security and bug fixes. ## How was this patch tested? Existing tests. Closes #24378 from srowen/SPARK-27469. Authored-by: Sean Owen Signed-off-by: Dongjoon Hyun --- LICENSE-binary | 1 - dev/deps/spark-deps-hadoop-2.7 | 3 +-- pom.xml| 10 ++ 3 files changed, 11 insertions(+), 3 deletions(-) diff --git a/LICENSE-binary b/LICENSE-binary index 5f57133..66c5599 100644 --- a/LICENSE-binary +++ b/LICENSE-binary @@ -302,7 +302,6 @@ com.google.code.gson:gson com.google.inject:guice com.google.inject.extensions:guice-servlet com.twitter:parquet-hadoop-bundle -commons-beanutils:commons-beanutils-core commons-cli:commons-cli commons-dbcp:commons-dbcp commons-io:commons-io diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7 index 58eb8d0..00dc2ce 100644 --- a/dev/deps/spark-deps-hadoop-2.7 +++ b/dev/deps/spark-deps-hadoop-2.7 @@ -26,8 +26,7 @@ breeze-macros_2.12-0.13.2.jar breeze_2.12-0.13.2.jar chill-java-0.9.3.jar chill_2.12-0.9.3.jar -commons-beanutils-1.7.0.jar -commons-beanutils-core-1.8.0.jar +commons-beanutils-1.9.3.jar commons-cli-1.2.jar commons-codec-1.10.jar commons-collections-3.2.2.jar diff --git a/pom.xml b/pom.xml index 0e1c67f..449b426 100644 --- a/pom.xml +++ b/pom.xml @@ -469,6 +469,11 @@ ${commons.collections.version} +commons-beanutils +commons-beanutils +1.9.3 + + org.apache.ivy ivy ${ivy.version} @@ -911,6 +916,11 @@ netty + +commons-beanutils +commons-beanutils-core + + commons-logging commons-logging - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 40668c5 [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit… 40668c5 is described below commit 40668c53ed799881db1f316ceaf2f978b294d8ed Author: pengbo AuthorDate: Mon Apr 15 15:37:07 2019 -0700 [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit… ## What changes were proposed in this pull request? The upper bound of group-by columns row number is to multiply distinct counts of group-by columns. However, column with only null value will cause the output row number to be 0 which is incorrect. Ex: col1 (distinct: 2, rowCount 2) col2 (distinct: 0, rowCount 2) => group by col1, col2 Actual: output rows: 0 Expected: output rows: 2 ## How was this patch tested? According unit test has been added, plus manual test has been done in our tpcds benchmark environement. Closes #24286 from pengbo/master. Lead-authored-by: pengbo Co-authored-by: mingbo_pb Signed-off-by: Dongjoon Hyun (cherry picked from commit c58a4fed8d79aff9fbac9f9a33141b2edbfb0cea) Signed-off-by: Dongjoon Hyun --- .../plans/logical/statsEstimation/AggregateEstimation.scala | 12 ++-- .../catalyst/statsEstimation/AggregateEstimationSuite.scala | 12 +++- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala index 111c594..7ef22fa 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala @@ -39,8 +39,16 @@ object AggregateEstimation { // Multiply distinct counts of group-by columns. This is an upper bound, which assumes // the data contains all combinations of distinct values of group-by columns. var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))( -(res, expr) => res * - childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount.get) +(res, expr) => { + val columnStat = childStats.attributeStats(expr.asInstanceOf[Attribute]) + val distinctCount = columnStat.distinctCount.get + val distinctValue: BigInt = if (distinctCount == 0 && columnStat.nullCount.get > 0) { +1 + } else { +distinctCount + } + res * distinctValue +}) outputRows = if (agg.groupingExpressions.isEmpty) { // If there's no group-by columns, the output is a single row containing values of aggregate diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala index 8213d56..6bdf8cd 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala @@ -38,7 +38,9 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest { attr("key22") -> ColumnStat(distinctCount = Some(2), min = Some(10), max = Some(20), nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)), attr("key31") -> ColumnStat(distinctCount = Some(0), min = None, max = None, - nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)) + nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)), +attr("key32") -> ColumnStat(distinctCount = Some(0), min = None, max = None, + nullCount = Some(4), avgLen = Some(4), maxLen = Some(4)) )) private val nameToAttr: Map[String, Attribute] = columnInfo.map(kv => kv._1.name -> kv._1) @@ -92,6 +94,14 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest { expectedOutputRowCount = 0) } + test("group-by column with only null value") { +checkAggStats( + tableColumns = Seq("key22", "key32"), + tableRowCount = 6, + groupByColumns = Seq("key22", "key32"), + expectedOutputRowCount = nameToColInfo("key22")._2.distinctCount.get) + } + test("non-cbo estimation") { val attributes = Seq("key12").map(nameToAttr) val child = StatsTestPlan( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail:
[spark] branch master updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…
This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new c58a4fe [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit… c58a4fe is described below commit c58a4fed8d79aff9fbac9f9a33141b2edbfb0cea Author: pengbo AuthorDate: Mon Apr 15 15:37:07 2019 -0700 [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit… ## What changes were proposed in this pull request? The upper bound of group-by columns row number is to multiply distinct counts of group-by columns. However, column with only null value will cause the output row number to be 0 which is incorrect. Ex: col1 (distinct: 2, rowCount 2) col2 (distinct: 0, rowCount 2) => group by col1, col2 Actual: output rows: 0 Expected: output rows: 2 ## How was this patch tested? According unit test has been added, plus manual test has been done in our tpcds benchmark environement. Closes #24286 from pengbo/master. Lead-authored-by: pengbo Co-authored-by: mingbo_pb Signed-off-by: Dongjoon Hyun --- .../plans/logical/statsEstimation/AggregateEstimation.scala | 12 ++-- .../catalyst/statsEstimation/AggregateEstimationSuite.scala | 12 +++- 2 files changed, 21 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala index 0606d0d..1198d3f 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala @@ -39,8 +39,16 @@ object AggregateEstimation { // Multiply distinct counts of group-by columns. This is an upper bound, which assumes // the data contains all combinations of distinct values of group-by columns. var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))( -(res, expr) => res * - childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount.get) +(res, expr) => { + val columnStat = childStats.attributeStats(expr.asInstanceOf[Attribute]) + val distinctCount = columnStat.distinctCount.get + val distinctValue: BigInt = if (distinctCount == 0 && columnStat.nullCount.get > 0) { +1 + } else { +distinctCount + } + res * distinctValue +}) outputRows = if (agg.groupingExpressions.isEmpty) { // If there's no group-by columns, the output is a single row containing values of aggregate diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala index dfa6e46..c247050 100644 --- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala +++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala @@ -38,7 +38,9 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest { attr("key22") -> ColumnStat(distinctCount = Some(2), min = Some(10), max = Some(20), nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)), attr("key31") -> ColumnStat(distinctCount = Some(0), min = None, max = None, - nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)) + nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)), +attr("key32") -> ColumnStat(distinctCount = Some(0), min = None, max = None, + nullCount = Some(4), avgLen = Some(4), maxLen = Some(4)) )) private val nameToAttr: Map[String, Attribute] = columnInfo.map(kv => kv._1.name -> kv._1) @@ -116,6 +118,14 @@ class AggregateEstimationSuite extends StatsEstimationTestBase with PlanTest { expectedOutputRowCount = 0) } + test("group-by column with only null value") { +checkAggStats( + tableColumns = Seq("key22", "key32"), + tableRowCount = 6, + groupByColumns = Seq("key22", "key32"), + expectedOutputRowCount = nameToColInfo("key22")._2.distinctCount.get) + } + test("non-cbo estimation") { val attributes = Seq("key12").map(nameToAttr) val child = StatsTestPlan( - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images
This is an automated email from the ASF dual-hosted git repository. meng pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d35e81f [SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images d35e81f is described below commit d35e81f4bc561598676a508319ec872f7361b069 Author: WeichenXu AuthorDate: Mon Apr 15 11:55:51 2019 -0700 [SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images ## What changes were proposed in this pull request? Fix in Spark image datasource fail when encounter some illegal images. This related to bugs inside `ImageIO.read` so in spark code I add exception handling for it. ## How was this patch tested? N/A Please review http://spark.apache.org/contributing.html before opening a pull request. Closes #24362 from WeichenXu123/fix_image_ds_bug. Authored-by: WeichenXu Signed-off-by: Xiangrui Meng --- mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala index 0b13eef..a7ddf2f 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala @@ -133,7 +133,13 @@ object ImageSchema { */ private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = { -val img = ImageIO.read(new ByteArrayInputStream(bytes)) +val img = try { + ImageIO.read(new ByteArrayInputStream(bytes)) +} catch { + // Catch runtime exception because `ImageIO` may throw unexcepted `RuntimeException`. + // But do not catch the declared `IOException` (regarded as FileSystem failure) + case _: RuntimeException => null +} if (img == null) { None - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ URL: https://github.com/apache/spark-website/pull/195#discussion_r275453837 ## File path: developer-tools.md ## @@ -397,6 +397,18 @@ Other tips: - "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources. +- Maven bundled in IntelliJ may not meet the minimum version requirement of the Spark. If that happens, Review comment: "The version of Maven bundled with IntelliJ may not be new enough for Spark. ..." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ URL: https://github.com/apache/spark-website/pull/195#discussion_r275454281 ## File path: developer-tools.md ## @@ -397,6 +397,18 @@ Other tips: - "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources. +- Maven bundled in IntelliJ may not meet the minimum version requirement of the Spark. If that happens, +the action "Generate Sources and Update Folders For All Projects" could fail silently. If you saw error like +``` +2019-04-14 16:05:24,796 [ 314609] INFO - #org.jetbrains.idea.maven - [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: +Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0. +2019-04-14 16:05:24,813 [ 314626] INFO - #org.jetbrains.idea.maven - org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. Review comment: Delete this and the next line; they're not that relevant This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ
srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ URL: https://github.com/apache/spark-website/pull/195#discussion_r275454211 ## File path: developer-tools.md ## @@ -397,6 +397,18 @@ Other tips: - "Rebuild Project" can fail the first time the project is compiled, because generate source files are not automatically generated. Try clicking the "Generate Sources and Update Folders For All Projects" button in the "Maven Projects" tool window to manually generate these sources. +- Maven bundled in IntelliJ may not meet the minimum version requirement of the Spark. If that happens, +the action "Generate Sources and Update Folders For All Projects" could fail silently. If you saw error like +``` +2019-04-14 16:05:24,796 [ 314609] INFO - #org.jetbrains.idea.maven - [WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed with message: +Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0. +2019-04-14 16:05:24,813 [ 314626] INFO - #org.jetbrains.idea.maven - org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. +java.lang.RuntimeException: org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce (enforce-versions) on project spark-parent_2.12: Some Enforcer rules have failed. Look above for specific messages explaining why the rule failed. +``` +in IntelliJ log file (`Help -> Show Log in Finder/Explorer`), you should reset the maven home directory Review comment: maven -> Maven I don't think you need to look a IJ's log files; it's just an update to preferences. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] William1104 opened a new pull request #195: remind developers to reset maven home in IntelliJ
William1104 opened a new pull request #195: remind developers to reset maven home in IntelliJ URL: https://github.com/apache/spark-website/pull/195 I tried to follow the guide at 'http://spark.apache.org/developer-tools.html' to setup an IntelliJ project for Spark. However, the project was failed to build. It was due to missing classes generated via antlr on sql/catalyst project even thought I clicked the 'Generate Sources and Update Folders For All Projects' button in IntelliJ as per suggested. It turned out that I forgot to reset the maven home in my IntelliJ and the IntelliJ failed the 'Generate Sources and Update Folders For All Projects' action silently. That was why ANTLR4 files were not generated as expected. To help other developers, I would like to enhance 'http://spark.apache.org/developer-tools.html' to add a note to remind developer to check if the 'Generate Sources and Update Folders For All Projects' action was failed silently due to incorrect maven version. If so, they should update the maven home in IntelliJ accordingly This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links
srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links URL: https://github.com/apache/spark-website/pull/194#discussion_r275397031 ## File path: developer-tools.md ## @@ -463,25 +463,16 @@ in the Eclipse install directory. Increase the following setting as needed: Nightly Builds -Packages are built regularly off of Spark's master branch and release branches. These provide -Spark developers access to the bleeding-edge of Spark master or the most recent fixes not yet -incorporated into a maintenance release. These should only be used by Spark developers, as they -may have bugs and have not undergone the same level of testing as releases. Spark nightly packages -are available at: - -- Latest master build: https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest;>https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest -- All nightly builds: https://people.apache.org/~pwendell/spark-nightly/;>https://people.apache.org/~pwendell/spark-nightly/ - -Spark also publishes SNAPSHOT releases of its Maven artifacts for both master and maintenance +Spark publishes SNAPSHOT releases of its Maven artifacts for both master and maintenance Review comment: We don't publish nightly builds anymore. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links
srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links URL: https://github.com/apache/spark-website/pull/194#discussion_r275397252 ## File path: powered-by.md ## @@ -47,16 +47,13 @@ initially launched Spark - http://alluxio.com/;>Alluxio - Alluxio, formerly Tachyon, is the world's first system that unifies disparate storage systems at memory speed. -- http://alpinenow.com/;>Alpine Data Labs Review comment: The removed orgs don't exist anymore This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[GitHub] [spark-website] srowen opened a new pull request #194: Remove links to dead orgs / meetups; fix some broken links
srowen opened a new pull request #194: Remove links to dead orgs / meetups; fix some broken links URL: https://github.com/apache/spark-website/pull/194 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated: [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi select query.
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 3ab96d7 [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi select query. 3ab96d7 is described below commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed Author: Dilip Biswal AuthorDate: Mon Apr 15 21:26:45 2019 +0800 [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi select query. ## What changes were proposed in this pull request? This is a minor pr to add a test to describe a multi select query. ## How was this patch tested? Added a test in describe-query.sql Closes #24370 from dilipbiswal/describe-query-multiselect-test. Authored-by: Dilip Biswal Signed-off-by: Wenchen Fan --- .../spark/sql/execution/command/tables.scala | 4 ++- .../resources/sql-tests/inputs/describe-query.sql | 6 ++-- .../sql-tests/results/describe-query.sql.out | 39 +- 3 files changed, 30 insertions(+), 19 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala index fb619a7..b31b2d3 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala @@ -635,7 +635,9 @@ case class DescribeTableCommand( * 3. VALUES statement. * 4. TABLE statement. Example : TABLE table_name * 5. statements of the form 'FROM table SELECT *' - * 6. Common table expressions (CTEs) + * 6. Multi select statements of the following form: + *select * from (from a select * select *) + * 7. Common table expressions (CTEs) */ case class DescribeQueryCommand(query: LogicalPlan) extends DescribeCommandBase { diff --git a/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql b/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql index bc144d0..b6351f9 100644 --- a/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql +++ b/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql @@ -10,11 +10,11 @@ DESC SELECT 10.00D as col1; DESC QUERY SELECT key FROM desc_temp1 UNION ALL select CAST(1 AS DOUBLE); DESC QUERY VALUES(1.00D, 'hello') as tab1(col1, col2); DESC QUERY FROM desc_temp1 a SELECT *; - - --- Error cases. DESC WITH s AS (SELECT 'hello' as col1) SELECT * FROM s; DESCRIBE QUERY WITH s AS (SELECT * from desc_temp1) SELECT * FROM s; +DESCRIBE SELECT * FROM (FROM desc_temp2 select * select *); + +-- Error cases. DESCRIBE INSERT INTO desc_temp1 values (1, 'val1'); DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2; DESCRIBE diff --git a/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out b/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out index fc51b46..15a346f 100644 --- a/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out +++ b/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out @@ -1,5 +1,5 @@ -- Automatically generated by SQLQueryTestSuite --- Number of queries: 16 +-- Number of queries: 17 -- !query 0 @@ -97,10 +97,19 @@ val string -- !query 11 -DESCRIBE INSERT INTO desc_temp1 values (1, 'val1') +DESCRIBE SELECT * FROM (FROM desc_temp2 select * select *) -- !query 11 schema -struct<> +struct -- !query 11 output +keyint +valstring + + +-- !query 12 +DESCRIBE INSERT INTO desc_temp1 values (1, 'val1') +-- !query 12 schema +struct<> +-- !query 12 output org.apache.spark.sql.catalyst.parser.ParseException mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21) @@ -110,11 +119,11 @@ DESCRIBE INSERT INTO desc_temp1 values (1, 'val1') -^^^ --- !query 12 +-- !query 13 DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2 --- !query 12 schema +-- !query 13 schema struct<> --- !query 12 output +-- !query 13 output org.apache.spark.sql.catalyst.parser.ParseException mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21) @@ -124,14 +133,14 @@ DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2 -^^^ --- !query 13 +-- !query 14 DESCRIBE FROM desc_temp1 a insert into desc_temp1 select * insert into desc_temp2 select * --- !query 13 schema +-- !query 14 schema struct<> --- !query 13 output +-- !query 14 output org.apache.spark.sql.catalyst.parser.ParseException mismatched input 'insert' expecting {, '(', ',', 'ANTI', 'CLUSTER', 'CROSS', 'DISTRIBUTE', 'EXCEPT', 'FULL', 'GROUP', 'HAVING', 'INNER', 'INTERSECT', 'JOIN', 'LATERAL', 'LEFT', 'LIMIT', 'NATURAL', 'ORDER', 'PIVOT', 'RIGHT', 'SELECT', 'SEMI',
[spark] branch master updated: [SPARK-27459][SQL] Revise the exception message of schema inference failure in file source V2
This is an automated email from the ASF dual-hosted git repository. wenchen pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 27d625d [SPARK-27459][SQL] Revise the exception message of schema inference failure in file source V2 27d625d is described below commit 27d625d785244ae78287e3a0eede44c79dfcbb92 Author: Gengliang Wang AuthorDate: Mon Apr 15 21:06:03 2019 +0800 [SPARK-27459][SQL] Revise the exception message of schema inference failure in file source V2 ## What changes were proposed in this pull request? Since https://github.com/apache/spark/pull/23383/files#diff-db4a140579c1ac4b1dbec7fe5057eecaR36, the exception message of schema inference failure in file source V2 is `tableName`, which is equivalent to `shortName + path`. While in file source V1, the message is `Unable to infer schema from ORC/CSV/JSON...`. We should make the message in V2 consistent with V1, so that in the future migration the related test cases don't need to be modified. https://github.com/apache/spark/pull/24058#pullrequestreview-226364350 ## How was this patch tested? Revert the modified unit test cases in https://github.com/apache/spark/pull/24005/files#diff-b9ddfbc9be8d83ecf100b3b8ff9610b9R431 and https://github.com/apache/spark/pull/23383/files#diff-9ab56940ee5a53f2bb81e3c008653362R577, and test with them. Closes #24369 from gengliangwang/reviseInferSchemaMessage. Authored-by: Gengliang Wang Signed-off-by: Wenchen Fan --- .../scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala | 2 +- .../org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala | 2 +- .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala| 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala index cb816d6..c0c57b8 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala @@ -54,7 +54,7 @@ abstract class FileTable( inferSchema(fileIndex.allFiles()) }.getOrElse { throw new AnalysisException( - s"Unable to infer schema for $name. It must be specified manually.") + s"Unable to infer schema for $formatName. It must be specified manually.") }.asNullable override lazy val schema: StructType = { diff --git a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala index fe40b9a..18ec3e3 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala @@ -580,7 +580,7 @@ abstract class OrcQueryTest extends OrcTest { val m1 = intercept[AnalysisException] { testAllCorruptFiles() }.getMessage - assert(m1.contains("Unable to infer schema")) + assert(m1.contains("Unable to infer schema for ORC")) testAllCorruptFilesWithoutSchemaInfer() } diff --git a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala index 2569085..9f96947 100644 --- a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala +++ b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala @@ -428,7 +428,7 @@ class DataFrameReaderWriterSuite extends QueryTest with SharedSQLContext with Be val message = intercept[AnalysisException] { testRead(spark.read.csv(), Seq.empty, schema) }.getMessage -assert(message.toLowerCase(Locale.ROOT).contains("unable to infer schema for csv")) +assert(message.contains("Unable to infer schema for CSV. It must be specified manually.")) testRead(spark.read.csv(dir), data, schema) testRead(spark.read.csv(dir, dir), data ++ data, schema) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org