date:20190415

[spark] branch master updated: [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3

2019-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new a4cf1a4  [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3
a4cf1a4 is described below

commit a4cf1a4f4e1b2707059c8c341e06942246cb83bf
Author: Sean Owen 
AuthorDate: Mon Apr 15 19:18:37 2019 -0700

[SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3

## What changes were proposed in this pull request?

Unify commons-beanutils deps to latest 1.9.3. This resolves the version 
inconsistency in Hadoop 2.7's build and also picks up security and bug fixes.

## How was this patch tested?

Existing tests.

Closes #24378 from srowen/SPARK-27469.

Authored-by: Sean Owen 
Signed-off-by: Dongjoon Hyun 
---
 LICENSE-binary |  1 -
 dev/deps/spark-deps-hadoop-2.7 |  3 +--
 pom.xml| 10 ++
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/LICENSE-binary b/LICENSE-binary
index 5f57133..66c5599 100644
--- a/LICENSE-binary
+++ b/LICENSE-binary
@@ -302,7 +302,6 @@ com.google.code.gson:gson
 com.google.inject:guice
 com.google.inject.extensions:guice-servlet
 com.twitter:parquet-hadoop-bundle
-commons-beanutils:commons-beanutils-core
 commons-cli:commons-cli
 commons-dbcp:commons-dbcp
 commons-io:commons-io
diff --git a/dev/deps/spark-deps-hadoop-2.7 b/dev/deps/spark-deps-hadoop-2.7
index 58eb8d0..00dc2ce 100644
--- a/dev/deps/spark-deps-hadoop-2.7
+++ b/dev/deps/spark-deps-hadoop-2.7
@@ -26,8 +26,7 @@ breeze-macros_2.12-0.13.2.jar
 breeze_2.12-0.13.2.jar
 chill-java-0.9.3.jar
 chill_2.12-0.9.3.jar
-commons-beanutils-1.7.0.jar
-commons-beanutils-core-1.8.0.jar
+commons-beanutils-1.9.3.jar
 commons-cli-1.2.jar
 commons-codec-1.10.jar
 commons-collections-3.2.2.jar
diff --git a/pom.xml b/pom.xml
index 0e1c67f..449b426 100644
--- a/pom.xml
+++ b/pom.xml
@@ -469,6 +469,11 @@
 ${commons.collections.version}
   
   
+commons-beanutils
+commons-beanutils
+1.9.3
+  
+  
 org.apache.ivy
 ivy
 ${ivy.version}
@@ -911,6 +916,11 @@
 netty
   
   
+
+commons-beanutils
+commons-beanutils-core
+  
+  
 commons-logging
 commons-logging
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-2.4 updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…

2019-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 40668c5  [SPARK-27351][SQL] Wrong outputRows estimation after 
AggregateEstimation wit…
40668c5 is described below

commit 40668c53ed799881db1f316ceaf2f978b294d8ed
Author: pengbo 
AuthorDate: Mon Apr 15 15:37:07 2019 -0700

[SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation 
wit…

## What changes were proposed in this pull request?
The upper bound of group-by columns row number is to multiply distinct 
counts of group-by columns. However, column with only null value will cause the 
output row number to be 0 which is incorrect.
Ex:
col1 (distinct: 2, rowCount 2)
col2 (distinct: 0, rowCount 2)
=> group by col1, col2
Actual: output rows: 0
Expected: output rows: 2

## How was this patch tested?
According unit test has been added, plus manual test has been done in our 
tpcds benchmark environement.

Closes #24286 from pengbo/master.

Lead-authored-by: pengbo 
Co-authored-by: mingbo_pb 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit c58a4fed8d79aff9fbac9f9a33141b2edbfb0cea)
Signed-off-by: Dongjoon Hyun 
---
 .../plans/logical/statsEstimation/AggregateEstimation.scala  | 12 ++--
 .../catalyst/statsEstimation/AggregateEstimationSuite.scala  | 12 +++-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
index 111c594..7ef22fa 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
@@ -39,8 +39,16 @@ object AggregateEstimation {
   // Multiply distinct counts of group-by columns. This is an upper bound, 
which assumes
   // the data contains all combinations of distinct values of group-by 
columns.
   var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))(
-(res, expr) => res *
-  
childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount.get)
+(res, expr) => {
+  val columnStat = 
childStats.attributeStats(expr.asInstanceOf[Attribute])
+  val distinctCount = columnStat.distinctCount.get
+  val distinctValue: BigInt = if (distinctCount == 0 && 
columnStat.nullCount.get > 0) {
+1
+  } else {
+distinctCount
+  }
+  res * distinctValue
+})
 
   outputRows = if (agg.groupingExpressions.isEmpty) {
 // If there's no group-by columns, the output is a single row 
containing values of aggregate
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
index 8213d56..6bdf8cd 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
@@ -38,7 +38,9 @@ class AggregateEstimationSuite extends 
StatsEstimationTestBase with PlanTest {
 attr("key22") -> ColumnStat(distinctCount = Some(2), min = Some(10), max = 
Some(20),
   nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
 attr("key31") -> ColumnStat(distinctCount = Some(0), min = None, max = 
None,
-  nullCount = Some(0), avgLen = Some(4), maxLen = Some(4))
+  nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
+attr("key32") -> ColumnStat(distinctCount = Some(0), min = None, max = 
None,
+  nullCount = Some(4), avgLen = Some(4), maxLen = Some(4))
   ))
 
   private val nameToAttr: Map[String, Attribute] = columnInfo.map(kv => 
kv._1.name -> kv._1)
@@ -92,6 +94,14 @@ class AggregateEstimationSuite extends 
StatsEstimationTestBase with PlanTest {
   expectedOutputRowCount = 0)
   }
 
+  test("group-by column with only null value") {
+checkAggStats(
+  tableColumns = Seq("key22", "key32"),
+  tableRowCount = 6,
+  groupByColumns = Seq("key22", "key32"),
+  expectedOutputRowCount = nameToColInfo("key22")._2.distinctCount.get)
+  }
+
   test("non-cbo estimation") {
 val attributes = Seq("key12").map(nameToAttr)
 val child = StatsTestPlan(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail:

[spark] branch master updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…

2019-04-15 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c58a4fe  [SPARK-27351][SQL] Wrong outputRows estimation after 
AggregateEstimation wit…
c58a4fe is described below

commit c58a4fed8d79aff9fbac9f9a33141b2edbfb0cea
Author: pengbo 
AuthorDate: Mon Apr 15 15:37:07 2019 -0700

[SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation 
wit…

## What changes were proposed in this pull request?
The upper bound of group-by columns row number is to multiply distinct 
counts of group-by columns. However, column with only null value will cause the 
output row number to be 0 which is incorrect.
Ex:
col1 (distinct: 2, rowCount 2)
col2 (distinct: 0, rowCount 2)
=> group by col1, col2
Actual: output rows: 0
Expected: output rows: 2

## How was this patch tested?
According unit test has been added, plus manual test has been done in our 
tpcds benchmark environement.

Closes #24286 from pengbo/master.

Lead-authored-by: pengbo 
Co-authored-by: mingbo_pb 
Signed-off-by: Dongjoon Hyun 
---
 .../plans/logical/statsEstimation/AggregateEstimation.scala  | 12 ++--
 .../catalyst/statsEstimation/AggregateEstimationSuite.scala  | 12 +++-
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
index 0606d0d..1198d3f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/statsEstimation/AggregateEstimation.scala
@@ -39,8 +39,16 @@ object AggregateEstimation {
   // Multiply distinct counts of group-by columns. This is an upper bound, 
which assumes
   // the data contains all combinations of distinct values of group-by 
columns.
   var outputRows: BigInt = agg.groupingExpressions.foldLeft(BigInt(1))(
-(res, expr) => res *
-  
childStats.attributeStats(expr.asInstanceOf[Attribute]).distinctCount.get)
+(res, expr) => {
+  val columnStat = 
childStats.attributeStats(expr.asInstanceOf[Attribute])
+  val distinctCount = columnStat.distinctCount.get
+  val distinctValue: BigInt = if (distinctCount == 0 && 
columnStat.nullCount.get > 0) {
+1
+  } else {
+distinctCount
+  }
+  res * distinctValue
+})
 
   outputRows = if (agg.groupingExpressions.isEmpty) {
 // If there's no group-by columns, the output is a single row 
containing values of aggregate
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
index dfa6e46..c247050 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/statsEstimation/AggregateEstimationSuite.scala
@@ -38,7 +38,9 @@ class AggregateEstimationSuite extends 
StatsEstimationTestBase with PlanTest {
 attr("key22") -> ColumnStat(distinctCount = Some(2), min = Some(10), max = 
Some(20),
   nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
 attr("key31") -> ColumnStat(distinctCount = Some(0), min = None, max = 
None,
-  nullCount = Some(0), avgLen = Some(4), maxLen = Some(4))
+  nullCount = Some(0), avgLen = Some(4), maxLen = Some(4)),
+attr("key32") -> ColumnStat(distinctCount = Some(0), min = None, max = 
None,
+  nullCount = Some(4), avgLen = Some(4), maxLen = Some(4))
   ))
 
   private val nameToAttr: Map[String, Attribute] = columnInfo.map(kv => 
kv._1.name -> kv._1)
@@ -116,6 +118,14 @@ class AggregateEstimationSuite extends 
StatsEstimationTestBase with PlanTest {
   expectedOutputRowCount = 0)
   }
 
+  test("group-by column with only null value") {
+checkAggStats(
+  tableColumns = Seq("key22", "key32"),
+  tableRowCount = 6,
+  groupByColumns = Seq("key22", "key32"),
+  expectedOutputRowCount = nameToColInfo("key22")._2.distinctCount.get)
+  }
+
   test("non-cbo estimation") {
 val attributes = Seq("key12").map(nameToAttr)
 val child = StatsTestPlan(


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images

2019-04-15 Thread meng

This is an automated email from the ASF dual-hosted git repository.

meng pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d35e81f  [SPARK-27454][ML][SQL] Spark image datasource fail when 
encounter some illegal images
d35e81f is described below

commit d35e81f4bc561598676a508319ec872f7361b069
Author: WeichenXu 
AuthorDate: Mon Apr 15 11:55:51 2019 -0700

[SPARK-27454][ML][SQL] Spark image datasource fail when encounter some 
illegal images

## What changes were proposed in this pull request?

Fix in Spark image datasource fail when encounter some illegal images.

This related to bugs inside `ImageIO.read` so in spark code I add exception 
handling for it.

## How was this patch tested?

N/A

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #24362 from WeichenXu123/fix_image_ds_bug.

Authored-by: WeichenXu 
Signed-off-by: Xiangrui Meng 
---
 mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala 
b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
index 0b13eef..a7ddf2f 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala
@@ -133,7 +133,13 @@ object ImageSchema {
*/
   private[spark] def decode(origin: String, bytes: Array[Byte]): Option[Row] = 
{
 
-val img = ImageIO.read(new ByteArrayInputStream(bytes))
+val img = try {
+  ImageIO.read(new ByteArrayInputStream(bytes))
+} catch {
+  // Catch runtime exception because `ImageIO` may throw unexcepted 
`RuntimeException`.
+  // But do not catch the declared `IOException` (regarded as FileSystem 
failure)
+  case _: RuntimeException => null
+}
 
 if (img == null) {
   None


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-15 Thread GitBox

srowen commented on a change in pull request #195: [SPARK-27458][Documentation] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275453837
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
 
 Review comment:
   "The version of Maven bundled with IntelliJ may not be new enough for Spark. 
..."


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-15 Thread GitBox

srowen commented on a change in pull request #195: [SPARK-27458][Documentation] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275454281
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
+2019-04-14 16:05:24,813 [ 314626]   INFO -  #org.jetbrains.idea.maven - 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed.
 
 Review comment:
   Delete this and the next line; they're not that relevant


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

2019-04-15 Thread GitBox

srowen commented on a change in pull request #195: [SPARK-27458][Documentation] 
remind developers to reset maven home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195#discussion_r275454211
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -397,6 +397,18 @@ Other tips:
 - "Rebuild Project" can fail the first time the project is compiled, because 
generate source files 
 are not automatically generated. Try clicking the "Generate Sources and Update 
Folders For All 
 Projects" button in the "Maven Projects" tool window to manually generate 
these sources.
+- Maven bundled in IntelliJ may not meet the minimum version requirement of 
the Spark. If that happens,
+the action "Generate Sources and Update Folders For All Projects" could fail 
silently. If you saw error like
+``` 
+2019-04-14 16:05:24,796 [ 314609]   INFO -  #org.jetbrains.idea.maven - 
[WARNING] Rule 0: org.apache.maven.plugins.enforcer.RequireMavenVersion failed 
with message:
+Detected Maven Version: 3.3.9 is not in the allowed range 3.6.0.
+2019-04-14 16:05:24,813 [ 314626]   INFO -  #org.jetbrains.idea.maven - 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed.
+java.lang.RuntimeException: 
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal 
org.apache.maven.plugins:maven-enforcer-plugin:3.0.0-M2:enforce 
(enforce-versions) on project spark-parent_2.12: Some Enforcer rules have 
failed. Look above for specific messages explaining why the rule failed.
+``` 
+in IntelliJ log file (`Help -> Show Log in Finder/Explorer`), you should reset 
the maven home directory 
 
 Review comment:
   maven -> Maven
   I don't think you need to look a IJ's log files; it's just an update to 
preferences.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] William1104 opened a new pull request #195: remind developers to reset maven home in IntelliJ

2019-04-15 Thread GitBox

William1104 opened a new pull request #195: remind developers to reset maven 
home in IntelliJ
URL: https://github.com/apache/spark-website/pull/195
 
 
   
   I tried to follow the guide at 
'http://spark.apache.org/developer-tools.html' to setup an IntelliJ project for 
Spark. However, the project was failed to build. It was due to missing classes 
generated via antlr on sql/catalyst project even thought I clicked the 
'Generate Sources and Update Folders For All Projects' button in IntelliJ as 
per suggested.
   
   It turned out that I forgot to reset the maven home in my IntelliJ and the 
IntelliJ failed the 'Generate Sources and Update Folders For All Projects' 
action silently. That was why ANTLR4 files were not generated as expected. 
   
   To help other developers, I would like to enhance 
'http://spark.apache.org/developer-tools.html' to add a note to remind 
developer to check if the 'Generate Sources and Update Folders For All 
Projects' action was failed silently due to incorrect maven version. If so, 
they should update the maven home in IntelliJ accordingly 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-15 Thread GitBox

srowen commented on a change in pull request #194: Remove links to dead orgs / 
meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275397031
 
 

 ##
 File path: developer-tools.md
 ##
 @@ -463,25 +463,16 @@ in the Eclipse install directory. Increase the following 
setting as needed:
 
 Nightly Builds
 
-Packages are built regularly off of Spark's master branch and release 
branches. These provide 
-Spark developers access to the bleeding-edge of Spark master or the most 
recent fixes not yet 
-incorporated into a maintenance release. These should only be used by Spark 
developers, as they 
-may have bugs and have not undergone the same level of testing as releases. 
Spark nightly packages 
-are available at:
-
-- Latest master build: https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest;>https://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest
-- All nightly builds: https://people.apache.org/~pwendell/spark-nightly/;>https://people.apache.org/~pwendell/spark-nightly/
-
-Spark also publishes SNAPSHOT releases of its Maven artifacts for both master 
and maintenance 
+Spark publishes SNAPSHOT releases of its Maven artifacts for both master and 
maintenance 
 
 Review comment:
   We don't publish nightly builds anymore.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-15 Thread GitBox

srowen commented on a change in pull request #194: Remove links to dead orgs / 
meetups; fix some broken links
URL: https://github.com/apache/spark-website/pull/194#discussion_r275397252
 
 

 ##
 File path: powered-by.md
 ##
 @@ -47,16 +47,13 @@ initially launched Spark
 - http://alluxio.com/;>Alluxio
   - Alluxio, formerly Tachyon, is the world's first system that unifies 
disparate storage systems
   at memory speed. 
-- http://alpinenow.com/;>Alpine Data Labs
 
 Review comment:
   The removed orgs don't exist anymore 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[GitHub] [spark-website] srowen opened a new pull request #194: Remove links to dead orgs / meetups; fix some broken links

2019-04-15 Thread GitBox

srowen opened a new pull request #194: Remove links to dead orgs / meetups; fix 
some broken links
URL: https://github.com/apache/spark-website/pull/194
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi select query.

2019-04-15 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 3ab96d7  [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for 
describing multi select query.
3ab96d7 is described below

commit 3ab96d7acf870e53c9016b0b63d0b328eec23bed
Author: Dilip Biswal 
AuthorDate: Mon Apr 15 21:26:45 2019 +0800

[SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi 
select query.

## What changes were proposed in this pull request?
This is a minor pr to add a test to describe a multi select query.

## How was this patch tested?
Added a test in describe-query.sql

Closes #24370 from dilipbiswal/describe-query-multiselect-test.

Authored-by: Dilip Biswal 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/execution/command/tables.scala   |  4 ++-
 .../resources/sql-tests/inputs/describe-query.sql  |  6 ++--
 .../sql-tests/results/describe-query.sql.out   | 39 +-
 3 files changed, 30 insertions(+), 19 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
index fb619a7..b31b2d3 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
@@ -635,7 +635,9 @@ case class DescribeTableCommand(
  * 3. VALUES statement.
  * 4. TABLE statement. Example : TABLE table_name
  * 5. statements of the form 'FROM table SELECT *'
- * 6. Common table expressions (CTEs)
+ * 6. Multi select statements of the following form:
+ *select * from (from a select * select *)
+ * 7. Common table expressions (CTEs)
  */
 case class DescribeQueryCommand(query: LogicalPlan)
   extends DescribeCommandBase {
diff --git a/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql 
b/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql
index bc144d0..b6351f9 100644
--- a/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql
+++ b/sql/core/src/test/resources/sql-tests/inputs/describe-query.sql
@@ -10,11 +10,11 @@ DESC SELECT 10.00D as col1;
 DESC QUERY SELECT key FROM desc_temp1 UNION ALL select CAST(1 AS DOUBLE);
 DESC QUERY VALUES(1.00D, 'hello') as tab1(col1, col2);
 DESC QUERY FROM desc_temp1 a SELECT *;
-
-
--- Error cases.
 DESC WITH s AS (SELECT 'hello' as col1) SELECT * FROM s;
 DESCRIBE QUERY WITH s AS (SELECT * from desc_temp1) SELECT * FROM s;
+DESCRIBE SELECT * FROM (FROM desc_temp2 select * select *);
+
+-- Error cases.
 DESCRIBE INSERT INTO desc_temp1 values (1, 'val1');
 DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2;
 DESCRIBE
diff --git 
a/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out 
b/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out
index fc51b46..15a346f 100644
--- a/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out
+++ b/sql/core/src/test/resources/sql-tests/results/describe-query.sql.out
@@ -1,5 +1,5 @@
 -- Automatically generated by SQLQueryTestSuite
--- Number of queries: 16
+-- Number of queries: 17
 
 
 -- !query 0
@@ -97,10 +97,19 @@ val string
 
 
 -- !query 11
-DESCRIBE INSERT INTO desc_temp1 values (1, 'val1')
+DESCRIBE SELECT * FROM (FROM desc_temp2 select * select *)
 -- !query 11 schema
-struct<>
+struct
 -- !query 11 output
+keyint 
+valstring
+
+
+-- !query 12
+DESCRIBE INSERT INTO desc_temp1 values (1, 'val1')
+-- !query 12 schema
+struct<>
+-- !query 12 output
 org.apache.spark.sql.catalyst.parser.ParseException
 
 mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21)
@@ -110,11 +119,11 @@ DESCRIBE INSERT INTO desc_temp1 values (1, 'val1')
 -^^^
 
 
--- !query 12
+-- !query 13
 DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2
--- !query 12 schema
+-- !query 13 schema
 struct<>
--- !query 12 output
+-- !query 13 output
 org.apache.spark.sql.catalyst.parser.ParseException
 
 mismatched input 'desc_temp1' expecting {, '.'}(line 1, pos 21)
@@ -124,14 +133,14 @@ DESCRIBE INSERT INTO desc_temp1 SELECT * FROM desc_temp2
 -^^^
 
 
--- !query 13
+-- !query 14
 DESCRIBE
FROM desc_temp1 a
  insert into desc_temp1 select *
  insert into desc_temp2 select *
--- !query 13 schema
+-- !query 14 schema
 struct<>
--- !query 13 output
+-- !query 14 output
 org.apache.spark.sql.catalyst.parser.ParseException
 
 mismatched input 'insert' expecting {, '(', ',', 'ANTI', 'CLUSTER', 
'CROSS', 'DISTRIBUTE', 'EXCEPT', 'FULL', 'GROUP', 'HAVING', 'INNER', 
'INTERSECT', 'JOIN', 'LATERAL', 'LEFT', 'LIMIT', 'NATURAL', 'ORDER', 'PIVOT', 
'RIGHT', 'SELECT', 'SEMI',

[spark] branch master updated: [SPARK-27459][SQL] Revise the exception message of schema inference failure in file source V2

2019-04-15 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 27d625d  [SPARK-27459][SQL] Revise the exception message of schema 
inference failure in file source V2
27d625d is described below

commit 27d625d785244ae78287e3a0eede44c79dfcbb92
Author: Gengliang Wang 
AuthorDate: Mon Apr 15 21:06:03 2019 +0800

[SPARK-27459][SQL] Revise the exception message of schema inference failure 
in file source V2

## What changes were proposed in this pull request?

Since 
https://github.com/apache/spark/pull/23383/files#diff-db4a140579c1ac4b1dbec7fe5057eecaR36,
 the exception message of schema inference failure in file source V2 is 
`tableName`, which is equivalent to `shortName + path`.

While in file source V1, the message is `Unable to infer schema from 
ORC/CSV/JSON...`.
We should make the message in V2 consistent with V1, so that in the future 
migration the related test cases don't need to be modified. 
https://github.com/apache/spark/pull/24058#pullrequestreview-226364350

## How was this patch tested?

Revert the modified unit test cases in 
https://github.com/apache/spark/pull/24005/files#diff-b9ddfbc9be8d83ecf100b3b8ff9610b9R431
 and 
https://github.com/apache/spark/pull/23383/files#diff-9ab56940ee5a53f2bb81e3c008653362R577,
 and test with them.

Closes #24369 from gengliangwang/reviseInferSchemaMessage.

Authored-by: Gengliang Wang 
Signed-off-by: Wenchen Fan 
---
 .../scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala | 2 +-
 .../org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala  | 2 +-
 .../scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala| 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
index cb816d6..c0c57b8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileTable.scala
@@ -54,7 +54,7 @@ abstract class FileTable(
 inferSchema(fileIndex.allFiles())
   }.getOrElse {
 throw new AnalysisException(
-  s"Unable to infer schema for $name. It must be specified manually.")
+  s"Unable to infer schema for $formatName. It must be specified 
manually.")
   }.asNullable
 
   override lazy val schema: StructType = {
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
index fe40b9a..18ec3e3 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/orc/OrcQuerySuite.scala
@@ -580,7 +580,7 @@ abstract class OrcQueryTest extends OrcTest {
   val m1 = intercept[AnalysisException] {
 testAllCorruptFiles()
   }.getMessage
-  assert(m1.contains("Unable to infer schema"))
+  assert(m1.contains("Unable to infer schema for ORC"))
   testAllCorruptFilesWithoutSchemaInfer()
 }
 
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
index 2569085..9f96947 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
@@ -428,7 +428,7 @@ class DataFrameReaderWriterSuite extends QueryTest with 
SharedSQLContext with Be
 val message = intercept[AnalysisException] {
   testRead(spark.read.csv(), Seq.empty, schema)
 }.getMessage
-assert(message.toLowerCase(Locale.ROOT).contains("unable to infer schema 
for csv"))
+assert(message.contains("Unable to infer schema for CSV. It must be 
specified manually."))
 
 testRead(spark.read.csv(dir), data, schema)
 testRead(spark.read.csv(dir, dir), data ++ data, schema)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-27469][CORE] Update Commons BeanUtils to 1.9.3

[spark] branch branch-2.4 updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…

[spark] branch master updated: [SPARK-27351][SQL] Wrong outputRows estimation after AggregateEstimation wit…

[spark] branch master updated: [SPARK-27454][ML][SQL] Spark image datasource fail when encounter some illegal images

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

[GitHub] [spark-website] srowen commented on a change in pull request #195: [SPARK-27458][Documentation] remind developers to reset maven home in IntelliJ

[GitHub] [spark-website] William1104 opened a new pull request #195: remind developers to reset maven home in IntelliJ

[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

[GitHub] [spark-website] srowen commented on a change in pull request #194: Remove links to dead orgs / meetups; fix some broken links

[GitHub] [spark-website] srowen opened a new pull request #194: Remove links to dead orgs / meetups; fix some broken links

[spark] branch master updated: [SPARK-27444][SQL][FOLLOWUP][MINOR][TEST] Add a test for describing multi select query.

[spark] branch master updated: [SPARK-27459][SQL] Revise the exception message of schema inference failure in file source V2

13 matches

Site Navigation

Mail list logo

Footer information