date:20201208

[spark] branch branch-3.1 updated: [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode

2020-12-08 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 57ed4df  [SPARK-33669] Wrong error message from YARN application state 
monitor when sc.stop in yarn client mode
57ed4df is described below

commit 57ed4df53e3dba01a20ab38ba8098b48594054fd
Author: suqilong 
AuthorDate: Wed Dec 9 01:21:13 2020 -0600

[SPARK-33669] Wrong error message from YARN application state monitor when 
sc.stop in yarn client mode

### What changes were proposed in this pull request?
This change make InterruptedIOException to be treated as 
InterruptedException when closing YarnClientSchedulerBackend, which doesn't log 
error with "YARN application has exited unexpectedly xxx"

### Why are the changes needed?
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first 
tries to interrupt Yarn application monitor thread. In MonitorThread.run() it 
catches InterruptedException to gracefully response to stopping request.

But client.monitorApplication method also throws InterruptedIOException 
when the hadoop rpc call is calling. In this case, MonitorThread will not know 
it is interrupted, a Yarn App failed is returned with "Failed to contact YARN 
for application x;  YARN application has exited unexpectedly with state 
x" is logged with error level. which confuse user a lot.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
very simple patch, seems no need?

Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor.

Authored-by: suqilong 
Signed-off-by: Mridul Muralidharan gmail.com>
(cherry picked from commit 48f93af9f3d40de5bf087eb1a06c1b9954b2ad76)
Signed-off-by: Mridul Muralidharan 
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala| 2 +-
 .../apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala  | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 7f791e0..618faef 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1069,7 +1069,7 @@ private[spark] class Client(
 logError(s"Application $appId not found.")
 cleanupStagingDir()
 return YarnAppReport(YarnApplicationState.KILLED, 
FinalApplicationStatus.KILLED, None)
-  case NonFatal(e) =>
+  case NonFatal(e) if !e.isInstanceOf[InterruptedIOException] =>
 val msg = s"Failed to contact YARN for application $appId."
 logError(msg, e)
 // Don't necessarily clean up staging dir because status is unknown
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
index cb0de5a..8a55e61 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.scheduler.cluster
 
+import java.io.InterruptedIOException
+
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.hadoop.yarn.api.records.YarnApplicationState
@@ -121,7 +123,8 @@ private[spark] class YarnClientSchedulerBackend(
 allowInterrupt = false
 sc.stop()
   } catch {
-case e: InterruptedException => logInfo("Interrupting monitor thread")
+case _: InterruptedException | _: InterruptedIOException =>
+  logInfo("Interrupting monitor thread")
   }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode

2020-12-08 Thread mridulm80

This is an automated email from the ASF dual-hosted git repository.

mridulm80 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 48f93af  [SPARK-33669] Wrong error message from YARN application state 
monitor when sc.stop in yarn client mode
48f93af is described below

commit 48f93af9f3d40de5bf087eb1a06c1b9954b2ad76
Author: suqilong 
AuthorDate: Wed Dec 9 01:21:13 2020 -0600

[SPARK-33669] Wrong error message from YARN application state monitor when 
sc.stop in yarn client mode

### What changes were proposed in this pull request?
This change make InterruptedIOException to be treated as 
InterruptedException when closing YarnClientSchedulerBackend, which doesn't log 
error with "YARN application has exited unexpectedly xxx"

### Why are the changes needed?
For YarnClient mode, when stopping YarnClientSchedulerBackend, it first 
tries to interrupt Yarn application monitor thread. In MonitorThread.run() it 
catches InterruptedException to gracefully response to stopping request.

But client.monitorApplication method also throws InterruptedIOException 
when the hadoop rpc call is calling. In this case, MonitorThread will not know 
it is interrupted, a Yarn App failed is returned with "Failed to contact YARN 
for application x;  YARN application has exited unexpectedly with state 
x" is logged with error level. which confuse user a lot.

### Does this PR introduce _any_ user-facing change?
Yes

### How was this patch tested?
very simple patch, seems no need?

Closes #30617 from sqlwindspeaker/yarn-client-interrupt-monitor.

Authored-by: suqilong 
Signed-off-by: Mridul Muralidharan gmail.com>
---
 .../yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala| 2 +-
 .../apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala  | 5 -
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 7f791e0..618faef 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -1069,7 +1069,7 @@ private[spark] class Client(
 logError(s"Application $appId not found.")
 cleanupStagingDir()
 return YarnAppReport(YarnApplicationState.KILLED, 
FinalApplicationStatus.KILLED, None)
-  case NonFatal(e) =>
+  case NonFatal(e) if !e.isInstanceOf[InterruptedIOException] =>
 val msg = s"Failed to contact YARN for application $appId."
 logError(msg, e)
 // Don't necessarily clean up staging dir because status is unknown
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
index cb0de5a..8a55e61 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala
@@ -17,6 +17,8 @@
 
 package org.apache.spark.scheduler.cluster
 
+import java.io.InterruptedIOException
+
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.hadoop.yarn.api.records.YarnApplicationState
@@ -121,7 +123,8 @@ private[spark] class YarnClientSchedulerBackend(
 allowInterrupt = false
 sc.stop()
   } catch {
-case e: InterruptedException => logInfo("Interrupting monitor thread")
+case _: InterruptedException | _: InterruptedIOException =>
+  logInfo("Interrupting monitor thread")
   }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types

2020-12-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new fa50fa1  [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for 
CHAR VARCHAR types
fa50fa1 is described below

commit fa50fa1bc17c1904121b35bad381063d9c7c8e70
Author: Kent Yao 
AuthorDate: Wed Dec 9 06:44:10 2020 +

[SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR 
types

### What changes were proposed in this pull request?

Add migration guide for CHAR VARCHAR types

### Why are the changes needed?

for migration

### Does this PR introduce _any_ user-facing change?

doc change

### How was this patch tested?

passing ci

Closes #30654 from yaooqinn/SPARK-33641-F.

Authored-by: Kent Yao 
Signed-off-by: Wenchen Fan 
(cherry picked from commit c88eddac3bf860d04bba91fc913f8b2069a94153)
Signed-off-by: Wenchen Fan 
---
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 2c86e7a..2bc04a0 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -54,6 +54,8 @@ license: |
   
   - In Spark 3.1, creating or altering a view will capture runtime SQL configs 
and store them as view properties. These configs will be applied during the 
parsing and analysis phases of the view resolution. To restore the behavior 
before Spark 3.1, you can set `spark.sql.legacy.useCurrentConfigsForView` to 
`true`.
 
+  - Since Spark 3.1, CHAR/CHARACTER and VARCHAR types are supported in the 
table schema. Table scan/insertion will respect the char/varchar semantic. If 
char/varchar is used in places other than table schema, an exception will be 
thrown (CAST is an exception that simply treats char/varchar as string like 
before). To restore the behavior before Spark 3.1, which treats them as STRING 
types and ignores a length parameter, e.g. `CHAR(4)`, you can set 
`spark.sql.legacy.charVarcharAsString` to [...]
+
 ## Upgrading from Spark SQL 3.0 to 3.0.1
 
 - In Spark 3.0, JSON datasource and JSON function `schema_of_json` infer 
TimestampType from string values if they match to the pattern defined by the 
JSON option `timestampFormat`. Since version 3.0.1, the timestamp type 
inference is disabled by default. Set the JSON option `inferTimestamp` to 
`true` to enable such type inference.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (29fed23 -> c88edda)

2020-12-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 29fed23  [SPARK-33703][SQL] Migrate MSCK REPAIR TABLE to use 
UnresolvedTable to resolve the identifier
 add c88edda  [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for 
CHAR VARCHAR types

No new revisions were added by this update.

Summary of changes:
 docs/sql-migration-guide.md | 2 ++
 1 file changed, 2 insertions(+)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (f021f6d -> 29fed23)

2020-12-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from f021f6d  [MINOR][ML] Increase Bounded MLOR (without regularization) 
test error tolerance
 add 29fed23  [SPARK-33703][SQL] Migrate MSCK REPAIR TABLE to use 
UnresolvedTable to resolve the identifier

No new revisions were added by this update.

Summary of changes:
 .../scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala  | 5 +++--
 .../org/apache/spark/sql/catalyst/plans/logical/statements.scala | 5 -
 .../org/apache/spark/sql/catalyst/plans/logical/v2Commands.scala | 7 +++
 .../org/apache/spark/sql/catalyst/parser/DDLParserSuite.scala| 2 +-
 .../spark/sql/catalyst/analysis/ResolveSessionCatalog.scala  | 7 ++-
 .../sql/execution/datasources/v2/DataSourceV2Strategy.scala  | 3 +++
 .../org/apache/spark/sql/connector/DataSourceV2SQLSuite.scala| 9 +
 7 files changed, 17 insertions(+), 21 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [MINOR][ML] Increase Bounded MLOR (without regularization) test error tolerance

2020-12-08 Thread weichenxu123

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new b0a70ab  [MINOR][ML] Increase Bounded MLOR (without regularization) 
test error tolerance
b0a70ab is described below

commit b0a70abed383cedd59f28cec75aa898df0c0b4bd
Author: Weichen Xu 
AuthorDate: Wed Dec 9 11:18:09 2020 +0800

[MINOR][ML] Increase Bounded MLOR (without regularization) test error 
tolerance

### What changes were proposed in this pull request?
Improve LogisticRegression test error tolerance

### Why are the changes needed?
When we switch BLAS version, some of the tests will fail due to too strict 
error tolerance in test.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #30587 from WeichenXu123/fix_lor_test.

Authored-by: Weichen Xu 
Signed-off-by: Weichen Xu 
(cherry picked from commit f021f6d3c72e1c84637798b4ddcb7e208fdfbf46)
Signed-off-by: Weichen Xu 
---
 .../ml/classification/LogisticRegressionSuite.scala | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
index d0b282d..d2814b4 100644
--- 
a/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/ml/classification/LogisticRegressionSuite.scala
@@ -1548,9 +1548,9 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
 val interceptsExpected1 = Vectors.dense(
   1.152482448372, 3.591773288423673, 5.079685953744937)
 
-checkCoefficientsEquivalent(model1.coefficientMatrix, 
coefficientsExpected1)
+checkBoundedMLORCoefficientsEquivalent(model1.coefficientMatrix, 
coefficientsExpected1)
 assert(model1.interceptVector ~== interceptsExpected1 relTol 0.01)
-checkCoefficientsEquivalent(model2.coefficientMatrix, 
coefficientsExpected1)
+checkBoundedMLORCoefficientsEquivalent(model2.coefficientMatrix, 
coefficientsExpected1)
 assert(model2.interceptVector ~== interceptsExpected1 relTol 0.01)
 
 // Bound constrained optimization with bound on both side.
@@ -1585,9 +1585,9 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
   isTransposed = true)
 val interceptsExpected3 = Vectors.dense(1.0, 2.0, 2.0)
 
-checkCoefficientsEquivalent(model3.coefficientMatrix, 
coefficientsExpected3)
+checkBoundedMLORCoefficientsEquivalent(model3.coefficientMatrix, 
coefficientsExpected3)
 assert(model3.interceptVector ~== interceptsExpected3 relTol 0.01)
-checkCoefficientsEquivalent(model4.coefficientMatrix, 
coefficientsExpected3)
+checkBoundedMLORCoefficientsEquivalent(model4.coefficientMatrix, 
coefficientsExpected3)
 assert(model4.interceptVector ~== interceptsExpected3 relTol 0.01)
 
 // Bound constrained optimization with infinite bound on both side.
@@ -1621,9 +1621,9 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
 val interceptsExpected5 = Vectors.dense(
   -2.2231282183460723, 0.3669496747012527, 1.856178543644802)
 
-checkCoefficientsEquivalent(model5.coefficientMatrix, 
coefficientsExpected5)
+checkBoundedMLORCoefficientsEquivalent(model5.coefficientMatrix, 
coefficientsExpected5)
 assert(model5.interceptVector ~== interceptsExpected5 relTol 0.01)
-checkCoefficientsEquivalent(model6.coefficientMatrix, 
coefficientsExpected5)
+checkBoundedMLORCoefficientsEquivalent(model6.coefficientMatrix, 
coefficientsExpected5)
 assert(model6.interceptVector ~== interceptsExpected5 relTol 0.01)
   }
 
@@ -1719,9 +1719,9 @@ class LogisticRegressionSuite extends MLTest with 
DefaultReadWriteTest {
   1.7524631428961193, 1.2292565990448736, 1.3433784431904323, 
1.5846063017678864),
   isTransposed = true)
 
-checkCoefficientsEquivalent(model1.coefficientMatrix, coefficientsExpected)
+checkBoundedMLORCoefficientsEquivalent(model1.coefficientMatrix, 
coefficientsExpected)
 assert(model1.interceptVector.toArray === Array.fill(3)(0.0))
-checkCoefficientsEquivalent(model2.coefficientMatrix, coefficientsExpected)
+checkBoundedMLORCoefficientsEquivalent(model2.coefficientMatrix, 
coefficientsExpected)
 assert(model2.interceptVector.toArray === Array.fill(3)(0.0))
   }
 
@@ -2953,16 +2953,17 @@ object LogisticRegressionSuite {
   }
 
   /**
+   * Note: This method is only used in Bounded MLOR (without regularization) 
test
* When no regularization is applied, the multinomial coefficients lack 
identifiability
* because we do not use a pivot class. We can add any constant value to

[spark] branch master updated (3ac70f1 -> f021f6d)

2020-12-08 Thread weichenxu123

This is an automated email from the ASF dual-hosted git repository.

weichenxu123 pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 3ac70f1  [SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and 
jackson-databind to 2.10.5.1
 add f021f6d  [MINOR][ML] Increase Bounded MLOR (without regularization) 
test error tolerance

No new revisions were added by this update.

Summary of changes:
 .../ml/classification/LogisticRegressionSuite.scala | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (6fd2345 -> 3ac70f1)

2020-12-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 6fd2345  [SPARK-32110][SQL] normalize special floating numbers in 
HyperLogLog++
 add 3ac70f1  [SPARK-33695][BUILD] Upgrade to jackson to 2.10.5 and 
jackson-databind to 2.10.5.1

No new revisions were added by this update.

Summary of changes:
 dev/deps/spark-deps-hadoop-2.7-hive-2.3 | 16 
 dev/deps/spark-deps-hadoop-3.2-hive-2.3 | 16 
 pom.xml |  5 +++--
 3 files changed, 19 insertions(+), 18 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

2020-12-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new eae6a3e  [SPARK-32110][SQL] normalize special floating numbers in 
HyperLogLog++
eae6a3e is described below

commit eae6a3e9dc75912e2fbe80d86f05cce629de8022
Author: Wenchen Fan 
AuthorDate: Tue Dec 8 11:41:35 2020 -0800

[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

### What changes were proposed in this pull request?

Currently, Spark treats 0.0 and -0.0 semantically equal, while it still 
retains the difference between them so that users can see -0.0 when displaying 
the data set.

The comparison expressions in Spark take care of the special floating 
numbers and implement the correct semantic. However, Spark doesn't always use 
these comparison expressions to compare values, and we need to normalize the 
special floating numbers before comparing them in these places:
1. GROUP BY
2. join keys
3. window partition keys

This PR fixes one more place that compares values without using comparison 
expressions: HyperLogLog++

### Why are the changes needed?

Fix the query result

### Does this PR introduce _any_ user-facing change?

Yes, the result of HyperLogLog++ becomes correct now.

### How was this patch tested?

a new test case, and a few more test cases that pass before this PR to 
improve test coverage.

Closes #30673 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 6fd234503cf1e85715ccd3bda42f29dae1daa71b)
Signed-off-by: Dongjoon Hyun 
---
 .../optimizer/NormalizeFloatingNumbers.scala   | 45 ++-
 .../catalyst/util/HyperLogLogPlusPlusHelper.scala  |  8 +-
 .../sql/catalyst/expressions/PredicateSuite.scala  | 90 ++
 .../aggregate/HyperLogLogPlusPlusSuite.scala   | 24 +-
 4 files changed, 144 insertions(+), 23 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index f0cf671..59265221 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -134,6 +134,28 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
 
 case _ => throw new IllegalStateException(s"fail to normalize $expr")
   }
+
+  val FLOAT_NORMALIZER: Any => Any = (input: Any) => {
+val f = input.asInstanceOf[Float]
+if (f.isNaN) {
+  Float.NaN
+} else if (f == -0.0f) {
+  0.0f
+} else {
+  f
+}
+  }
+
+  val DOUBLE_NORMALIZER: Any => Any = (input: Any) => {
+val d = input.asInstanceOf[Double]
+if (d.isNaN) {
+  Double.NaN
+} else if (d == -0.0d) {
+  0.0d
+} else {
+  d
+}
+  }
 }
 
 case class NormalizeNaNAndZero(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
@@ -143,27 +165,8 @@ case class NormalizeNaNAndZero(child: Expression) extends 
UnaryExpression with E
   override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(FloatType, DoubleType))
 
   private lazy val normalizer: Any => Any = child.dataType match {
-case FloatType => (input: Any) => {
-  val f = input.asInstanceOf[Float]
-  if (f.isNaN) {
-Float.NaN
-  } else if (f == -0.0f) {
-0.0f
-  } else {
-f
-  }
-}
-
-case DoubleType => (input: Any) => {
-  val d = input.asInstanceOf[Double]
-  if (d.isNaN) {
-Double.NaN
-  } else if (d == -0.0d) {
-0.0d
-  } else {
-d
-  }
-}
+case FloatType => NormalizeFloatingNumbers.FLOAT_NORMALIZER
+case DoubleType => NormalizeFloatingNumbers.DOUBLE_NORMALIZER
   }
 
   override def nullSafeEval(input: Any): Any = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
index ea619c6..6471a74 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
@@ -22,6 +22,7 @@ import java.util
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.XxHash64Function
+import 
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers.{DOUBLE_NORMALIZER,
 FLOAT_NORMALIZER}
 import org.apache.spark.sql.types._
 
 // A helper class for HyperLogLogPlusPlus.
@@

[spark] branch branch-3.1 updated: [SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

2020-12-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 1093c0f  [SPARK-32110][SQL] normalize special floating numbers in 
HyperLogLog++
1093c0f is described below

commit 1093c0f3cbd107eccba77dc39dae60e5019578cc
Author: Wenchen Fan 
AuthorDate: Tue Dec 8 11:41:35 2020 -0800

[SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

### What changes were proposed in this pull request?

Currently, Spark treats 0.0 and -0.0 semantically equal, while it still 
retains the difference between them so that users can see -0.0 when displaying 
the data set.

The comparison expressions in Spark take care of the special floating 
numbers and implement the correct semantic. However, Spark doesn't always use 
these comparison expressions to compare values, and we need to normalize the 
special floating numbers before comparing them in these places:
1. GROUP BY
2. join keys
3. window partition keys

This PR fixes one more place that compares values without using comparison 
expressions: HyperLogLog++

### Why are the changes needed?

Fix the query result

### Does this PR introduce _any_ user-facing change?

Yes, the result of HyperLogLog++ becomes correct now.

### How was this patch tested?

a new test case, and a few more test cases that pass before this PR to 
improve test coverage.

Closes #30673 from cloud-fan/bug.

Authored-by: Wenchen Fan 
Signed-off-by: Dongjoon Hyun 
(cherry picked from commit 6fd234503cf1e85715ccd3bda42f29dae1daa71b)
Signed-off-by: Dongjoon Hyun 
---
 .../optimizer/NormalizeFloatingNumbers.scala   | 45 ++-
 .../catalyst/util/HyperLogLogPlusPlusHelper.scala  |  8 +-
 .../sql/catalyst/expressions/PredicateSuite.scala  | 90 ++
 .../aggregate/HyperLogLogPlusPlusSuite.scala   | 24 +-
 4 files changed, 144 insertions(+), 23 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
index 4434c29..ac8766c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/NormalizeFloatingNumbers.scala
@@ -143,6 +143,28 @@ object NormalizeFloatingNumbers extends Rule[LogicalPlan] {
 
 case _ => throw new IllegalStateException(s"fail to normalize $expr")
   }
+
+  val FLOAT_NORMALIZER: Any => Any = (input: Any) => {
+val f = input.asInstanceOf[Float]
+if (f.isNaN) {
+  Float.NaN
+} else if (f == -0.0f) {
+  0.0f
+} else {
+  f
+}
+  }
+
+  val DOUBLE_NORMALIZER: Any => Any = (input: Any) => {
+val d = input.asInstanceOf[Double]
+if (d.isNaN) {
+  Double.NaN
+} else if (d == -0.0d) {
+  0.0d
+} else {
+  d
+}
+  }
 }
 
 case class NormalizeNaNAndZero(child: Expression) extends UnaryExpression with 
ExpectsInputTypes {
@@ -152,27 +174,8 @@ case class NormalizeNaNAndZero(child: Expression) extends 
UnaryExpression with E
   override def inputTypes: Seq[AbstractDataType] = 
Seq(TypeCollection(FloatType, DoubleType))
 
   private lazy val normalizer: Any => Any = child.dataType match {
-case FloatType => (input: Any) => {
-  val f = input.asInstanceOf[Float]
-  if (f.isNaN) {
-Float.NaN
-  } else if (f == -0.0f) {
-0.0f
-  } else {
-f
-  }
-}
-
-case DoubleType => (input: Any) => {
-  val d = input.asInstanceOf[Double]
-  if (d.isNaN) {
-Double.NaN
-  } else if (d == -0.0d) {
-0.0d
-  } else {
-d
-  }
-}
+case FloatType => NormalizeFloatingNumbers.FLOAT_NORMALIZER
+case DoubleType => NormalizeFloatingNumbers.DOUBLE_NORMALIZER
   }
 
   override def nullSafeEval(input: Any): Any = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
index ea619c6..6471a74 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/HyperLogLogPlusPlusHelper.scala
@@ -22,6 +22,7 @@ import java.util
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.XxHash64Function
+import 
org.apache.spark.sql.catalyst.optimizer.NormalizeFloatingNumbers.{DOUBLE_NORMALIZER,
 FLOAT_NORMALIZER}
 import org.apache.spark.sql.types._
 
 // A helper class for HyperLogLogPlusPlus.
@@ -88,7

[spark] branch master updated (c001dd4 -> 6fd2345)

2020-12-08 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c001dd4  [SPARK-33675][INFRA][FOLLOWUP] Schedule branch-3.1 snapshot 
at master branch
 add 6fd2345  [SPARK-32110][SQL] normalize special floating numbers in 
HyperLogLog++

No new revisions were added by this update.

Summary of changes:
 .../optimizer/NormalizeFloatingNumbers.scala   | 45 ++-
 .../catalyst/util/HyperLogLogPlusPlusHelper.scala  |  8 +-
 .../sql/catalyst/expressions/PredicateSuite.scala  | 90 ++
 .../aggregate/HyperLogLogPlusPlusSuite.scala   | 24 +-
 4 files changed, 144 insertions(+), 23 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated (c05ee06 -> a093d6f)

2020-12-08 Thread srowen

This is an automated email from the ASF dual-hosted git repository.

srowen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from c05ee06  [SPARK-33685][SQL] Migrate DROP VIEW command to use 
UnresolvedView to resolve the identifier
 add a093d6f  [MINOR] Spelling sql/core

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/execution/ui/static/spark-sql-viz.js |  8 +++---
 .../main/scala/org/apache/spark/sql/Dataset.scala  | 10 +++
 .../spark/sql/execution/DataSourceScanExec.scala   |  6 ++--
 .../apache/spark/sql/execution/ExplainUtils.scala  |  8 +++---
 .../ExternalAppendOnlyUnsafeRowArray.scala |  2 +-
 .../spark/sql/execution/SparkSqlParser.scala   | 14 +-
 .../sql/execution/WholeStageCodegenExec.scala  |  2 +-
 .../adaptive/AdaptiveSparkPlanHelper.scala |  2 +-
 .../command/InsertIntoDataSourceDirCommand.scala   |  2 +-
 .../apache/spark/sql/execution/command/ddl.scala   |  4 +--
 .../spark/sql/execution/command/tables.scala   |  2 +-
 .../sql/execution/datasources/DataSource.scala |  2 +-
 .../datasources/FileFormatDataWriter.scala | 14 +-
 .../execution/datasources/FileFormatWriter.scala   |  2 +-
 .../execution/datasources/PartitioningUtils.scala  |  2 +-
 .../datasources/v2/WriteToDataSourceV2Exec.scala   |  2 +-
 .../spark/sql/execution/joins/HashedRelation.scala |  4 +--
 .../sql/execution/python/ExtractPythonUDFs.scala   |  6 ++--
 .../streaming/CompactibleFileStreamLog.scala   |  2 +-
 .../sql/execution/streaming/StreamExecution.scala  |  2 +-
 .../state/FlatMapGroupsWithStateExecHelper.scala   |  2 +-
 .../org/apache/spark/sql/internal/HiveSerDe.scala  |  2 +-
 .../spark/sql/streaming/DataStreamWriter.scala |  4 +--
 .../spark/sql/Java8DatasetAggregatorSuite.java | 16 +--
 .../spark/sql/JavaDatasetAggregatorSuite.java  | 24 
 .../inputs/ansi/decimalArithmeticOperations.sql|  2 +-
 .../sql-tests/inputs/postgreSQL/create_view.sql|  2 +-
 .../org/apache/spark/sql/CachedTableSuite.scala|  8 +++---
 .../org/apache/spark/sql/DataFrameSuite.scala  |  2 +-
 .../org/apache/spark/sql/DatasetCacheSuite.scala   | 13 +
 .../apache/spark/sql/DatasetPrimitiveSuite.scala   |  8 +++---
 .../scala/org/apache/spark/sql/DatasetSuite.scala  | 32 +++---
 .../org/apache/spark/sql/DateFunctionsSuite.scala  |  6 ++--
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala |  6 ++--
 .../org/apache/spark/sql/SQLQueryTestSuite.scala   | 10 +++
 .../spark/sql/SparkSessionExtensionSuite.scala | 18 ++--
 .../org/apache/spark/sql/TPCDSTableStats.scala |  2 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 12 
 .../sql/execution/SQLWindowFunctionSuite.scala |  2 +-
 .../spark/sql/execution/SparkSqlParserSuite.scala  |  2 +-
 .../sql/execution/WholeStageCodegenSuite.scala |  4 +--
 .../adaptive/AdaptiveQueryExecSuite.scala  |  8 +++---
 .../sql/execution/arrow/ArrowConvertersSuite.scala |  2 +-
 .../spark/sql/execution/command/DDLSuite.scala | 12 
 .../execution/command/PlanResolutionSuite.scala| 16 +--
 .../execution/datasources/DataSourceSuite.scala|  4 +--
 .../execution/datasources/SchemaPruningSuite.scala |  8 +++---
 .../parquet/ParquetInteroperabilitySuite.scala |  2 +-
 .../parquet/ParquetPartitionDiscoverySuite.scala   |  4 +--
 .../datasources/parquet/ParquetQuerySuite.scala|  4 +--
 .../exchange/EnsureRequirementsSuite.scala |  2 +-
 .../sql/execution/metric/SQLMetricsSuite.scala |  2 +-
 .../execution/streaming/HDFSMetadataLogSuite.scala |  2 +-
 .../sql/execution/ui/SparkPlanInfoSuite.scala  |  6 ++--
 .../sql/internal/ExecutorSideSQLConfSuite.scala|  4 +--
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 12 
 .../spark/sql/sources/BucketedReadSuite.scala  | 18 ++--
 .../sql/sources/CreateTableAsSelectSuite.scala |  2 +-
 .../apache/spark/sql/sources/TableScanSuite.scala  |  6 ++--
 .../sql/streaming/FileStreamSourceSuite.scala  |  4 +--
 .../apache/spark/sql/streaming/StreamSuite.scala   |  8 +++---
 .../streaming/test/DataStreamTableAPISuite.scala   |  8 +++---
 .../org/apache/spark/sql/test/SQLTestData.scala|  4 +--
 .../apache/spark/sql/test/SharedSparkSession.scala |  2 +-
 64 files changed, 208 insertions(+), 205 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to resolve the identifier

2020-12-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new c05ee06  [SPARK-33685][SQL] Migrate DROP VIEW command to use 
UnresolvedView to resolve the identifier
c05ee06 is described below

commit c05ee06f5b711dd261dc94a01b4ba4ffccdf2ea0
Author: Terry Kim 
AuthorDate: Tue Dec 8 14:07:58 2020 +

[SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to 
resolve the identifier

### What changes were proposed in this pull request?

This PR introduces `UnresolvedView` in the resolution framework to resolve 
the identifier.

This PR then migrates `DROP VIEW` to use `UnresolvedView` to resolve the 
table/view identifier. This allows consistent resolution rules (temp view 
first, etc.) to be applied for both v1/v2 commands. More info about the 
consistent resolution rule proposal can be found in 
[JIRA](https://issues.apache.org/jira/browse/SPARK-29900) or [proposal 
doc](https://docs.google.com/document/d/1hvLjGA8y_W_hhilpngXVub1Ebv8RsMap986nENCFnrg/edit?usp=sharing).

### Why are the changes needed?

To use `UnresolvedView` for view resolution. Note that there is no 
resolution behavior change with this PR.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Updated existing tests.

Closes #30636 from imback82/drop_view_v2.

Authored-by: Terry Kim 
Signed-off-by: Wenchen Fan 
---
 .../spark/sql/catalyst/analysis/Analyzer.scala | 17 ++--
 .../sql/catalyst/analysis/CheckAnalysis.scala  | 13 +++--
 .../sql/catalyst/analysis/ResolveCatalogs.scala|  5 
 ...ble.scala => ResolveCommandsWithIfExists.scala} | 14 +-
 .../sql/catalyst/analysis/v2ResolutionPlans.scala  | 13 +
 .../spark/sql/catalyst/parser/AstBuilder.scala |  9 ---
 .../sql/catalyst/plans/logical/statements.scala|  7 -
 .../sql/catalyst/plans/logical/v2Commands.scala| 15 +--
 .../spark/sql/catalyst/parser/DDLParserSuite.scala | 17 +++-
 .../catalyst/analysis/ResolveSessionCatalog.scala  |  5 ++--
 .../datasources/v2/DataSourceV2Strategy.scala  |  2 +-
 .../spark/sql/connector/DataSourceV2SQLSuite.scala | 14 +-
 .../spark/sql/execution/command/DDLSuite.scala |  5 ++--
 .../execution/command/PlanResolutionSuite.scala| 31 +++---
 .../spark/sql/hive/execution/HiveDDLSuite.scala|  3 ++-
 15 files changed, 118 insertions(+), 52 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 680ec98..6b0cf4b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -150,7 +150,7 @@ object AnalysisContext {
  * [[UnresolvedRelation]]s into fully typed objects using information in a 
[[SessionCatalog]].
  */
 class Analyzer(override val catalogManager: CatalogManager)
-  extends RuleExecutor[LogicalPlan] with CheckAnalysis with LookupCatalog with 
SQLConfHelper {
+  extends RuleExecutor[LogicalPlan] with CheckAnalysis with SQLConfHelper {
 
   private val v1SessionCatalog: SessionCatalog = 
catalogManager.v1SessionCatalog
 
@@ -277,7 +277,7 @@ class Analyzer(override val catalogManager: CatalogManager)
   TypeCoercion.typeCoercionRules ++
   extendedResolutionRules : _*),
 Batch("Post-Hoc Resolution", Once,
-  Seq(ResolveNoopDropTable) ++
+  Seq(ResolveCommandsWithIfExists) ++
   postHocResolutionRules: _*),
 Batch("Normalize Alter Table", Once, ResolveAlterTableChanges),
 Batch("Remove Unresolved Hints", Once,
@@ -889,6 +889,11 @@ class Analyzer(override val catalogManager: CatalogManager)
   u.failAnalysis(s"${ident.quoted} is a temp view. '$cmd' expects a 
table")
 }
 u
+  case u @ UnresolvedView(ident, _, _) =>
+lookupTempView(ident).map { _ =>
+  ResolvedView(ident.asIdentifier, isTemp = true)
+}
+.getOrElse(u)
   case u @ UnresolvedTableOrView(ident, cmd, allowTempView) =>
 lookupTempView(ident)
   .map { _ =>
@@ -1113,6 +1118,14 @@ class Analyzer(override val catalogManager: 
CatalogManager)
   case table => table
 }.getOrElse(u)
 
+  case u @ UnresolvedView(identifier, cmd, relationTypeMismatchHint) =>
+lookupTableOrView(identifier).map {
+  case v: ResolvedView => v
+  case _ =>
+u.failAnalysis(s"${identifier.quoted} is a table. '$cmd' expects a 
view." +
+  relationTypeMismatchHint.map(" " + _).getOrElse(""))
+}.getOrElse(u)
+
   case u @

[spark] branch master updated (99613cd -> 2b30dde)

2020-12-08 Thread wenchen

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 99613cd  [SPARK-33677][SQL] Skip LikeSimplification rule if pattern 
contains any escapeChar
 add 2b30dde  [SPARK-33688][SQL] Migrate SHOW TABLE EXTENDED to new 
resolution framework

No new revisions were added by this update.

Summary of changes:
 .../apache/spark/sql/catalyst/parser/SqlBase.g4|  2 +-
 .../spark/sql/catalyst/analysis/Analyzer.scala |  2 ++
 .../spark/sql/catalyst/parser/AstBuilder.scala | 15 
 .../sql/catalyst/plans/logical/statements.scala|  9 
 .../sql/catalyst/plans/logical/v2Commands.scala| 20 ++--
 .../catalyst/analysis/ResolveSessionCatalog.scala  | 20 ++--
 .../datasources/v2/DataSourceV2Strategy.scala  |  3 +++
 .../execution/command/ShowTablesParserSuite.scala  | 27 ++
 .../sql/execution/command/v2/ShowTablesSuite.scala |  7 +++---
 9 files changed, 67 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

2020-12-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
 new ea7c2a1  [SPARK-33677][SQL] Skip LikeSimplification rule if pattern 
contains any escapeChar
ea7c2a1 is described below

commit ea7c2a15a02d8c8cf3e3f5a1260da76829d59596
Author: luluorta 
AuthorDate: Tue Dec 8 20:45:25 2020 +0900

[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any 
escapeChar

### What changes were proposed in this pull request?
`LikeSimplification` rule does not work correctly for many cases that have 
patterns containing escape characters, for example:

`SELECT s LIKE 'm%aca' ESCAPE '%' FROM t`
`SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t`

For simpilicy, this PR makes this rule just be skipped if `pattern` 
contains any `escapeChar`.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30625 from luluorta/SPARK-33677.

Authored-by: luluorta 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 99613cd5815b2de12274027dee0c0a6c0c57bd95)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/optimizer/expressions.scala | 18 +---
 .../optimizer/LikeSimplificationSuite.scala| 48 ++
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 14 +++
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
index 7773f5c..7cbeb47 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
@@ -525,27 +525,33 @@ object LikeSimplification extends Rule[LogicalPlan] {
   private val equalTo = "([^_%]*)".r
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
-case Like(input, Literal(pattern, StringType), escapeChar) =>
+case l @ Like(input, Literal(pattern, StringType), escapeChar) =>
   if (pattern == null) {
 // If pattern is null, return null value directly, since "col like 
null" == null.
 Literal(null, BooleanType)
   } else {
-val escapeStr = String.valueOf(escapeChar)
 pattern.toString match {
-  case startsWith(prefix) if !prefix.endsWith(escapeStr) =>
+  // There are three different situations when pattern containing 
escapeChar:
+  // 1. pattern contains invalid escape sequence, e.g. 'm\aca'
+  // 2. pattern contains escaped wildcard character, e.g. 'ma\%ca'
+  // 3. pattern contains escaped escape character, e.g. 'ma\\ca'
+  // Although there are patterns can be optimized if we handle the 
escape first, we just
+  // skip this rule if pattern contains any escapeChar for simplicity.
+  case p if p.contains(escapeChar) => l
+  case startsWith(prefix) =>
 StartsWith(input, Literal(prefix))
   case endsWith(postfix) =>
 EndsWith(input, Literal(postfix))
   // 'a%a' pattern is basically same with 'a%' && '%a'.
   // However, the additional `Length` condition is required to prevent 
'a' match 'a%a'.
-  case startsAndEndsWith(prefix, postfix) if 
!prefix.endsWith(escapeStr) =>
+  case startsAndEndsWith(prefix, postfix) =>
 And(GreaterThanOrEqual(Length(input), Literal(prefix.length + 
postfix.length)),
   And(StartsWith(input, Literal(prefix)), EndsWith(input, 
Literal(postfix
-  case contains(infix) if !infix.endsWith(escapeStr) =>
+  case contains(infix) =>
 Contains(input, Literal(infix))
   case equalTo(str) =>
 EqualTo(input, Literal(str))
-  case _ => Like(input, Literal.create(pattern, StringType), 
escapeChar)
+  case _ => l
 }
   }
   }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
index 436f62e..1812dce 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
@@ -116,4 +116,52 @@ class LikeSimplificationSuite extends PlanTest {
 val optimized2 = Optimize.execute(originalQuery2.analyze)
 comparePlans(optimized2, originalQuery2.analyze)
   }
+
+  test("SPARK-33677: LikeSimplification should be skipped if pattern

[spark] branch master updated (031c5ef -> 99613cd)

2020-12-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 031c5ef  [SPARK-33679][SQL] Enable spark.sql.adaptive.enabled by 
default
 add 99613cd  [SPARK-33677][SQL] Skip LikeSimplification rule if pattern 
contains any escapeChar

No new revisions were added by this update.

Summary of changes:
 .../spark/sql/catalyst/optimizer/expressions.scala | 18 +---
 .../optimizer/LikeSimplificationSuite.scala| 48 ++
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 14 +++
 3 files changed, 74 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.1 updated: [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

2020-12-08 Thread yamamuro

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.1
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.1 by this push:
 new 54a73ab  [SPARK-33677][SQL] Skip LikeSimplification rule if pattern 
contains any escapeChar
54a73ab is described below

commit 54a73ab7bb062b0fad3b8925d03bb4dca9fdc17a
Author: luluorta 
AuthorDate: Tue Dec 8 20:45:25 2020 +0900

[SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any 
escapeChar

### What changes were proposed in this pull request?
`LikeSimplification` rule does not work correctly for many cases that have 
patterns containing escape characters, for example:

`SELECT s LIKE 'm%aca' ESCAPE '%' FROM t`
`SELECT s LIKE 'maacaa' ESCAPE 'a' FROM t`

For simpilicy, this PR makes this rule just be skipped if `pattern` 
contains any `escapeChar`.

### Why are the changes needed?
Result corrupt.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added Unit test.

Closes #30625 from luluorta/SPARK-33677.

Authored-by: luluorta 
Signed-off-by: Takeshi Yamamuro 
(cherry picked from commit 99613cd5815b2de12274027dee0c0a6c0c57bd95)
Signed-off-by: Takeshi Yamamuro 
---
 .../spark/sql/catalyst/optimizer/expressions.scala | 18 +---
 .../optimizer/LikeSimplificationSuite.scala| 48 ++
 .../scala/org/apache/spark/sql/SQLQuerySuite.scala | 14 +++
 3 files changed, 74 insertions(+), 6 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
index 1b1e2ad..b2fc334 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala
@@ -543,27 +543,33 @@ object LikeSimplification extends Rule[LogicalPlan] {
   private val equalTo = "([^_%]*)".r
 
   def apply(plan: LogicalPlan): LogicalPlan = plan transformAllExpressions {
-case Like(input, Literal(pattern, StringType), escapeChar) =>
+case l @ Like(input, Literal(pattern, StringType), escapeChar) =>
   if (pattern == null) {
 // If pattern is null, return null value directly, since "col like 
null" == null.
 Literal(null, BooleanType)
   } else {
-val escapeStr = String.valueOf(escapeChar)
 pattern.toString match {
-  case startsWith(prefix) if !prefix.endsWith(escapeStr) =>
+  // There are three different situations when pattern containing 
escapeChar:
+  // 1. pattern contains invalid escape sequence, e.g. 'm\aca'
+  // 2. pattern contains escaped wildcard character, e.g. 'ma\%ca'
+  // 3. pattern contains escaped escape character, e.g. 'ma\\ca'
+  // Although there are patterns can be optimized if we handle the 
escape first, we just
+  // skip this rule if pattern contains any escapeChar for simplicity.
+  case p if p.contains(escapeChar) => l
+  case startsWith(prefix) =>
 StartsWith(input, Literal(prefix))
   case endsWith(postfix) =>
 EndsWith(input, Literal(postfix))
   // 'a%a' pattern is basically same with 'a%' && '%a'.
   // However, the additional `Length` condition is required to prevent 
'a' match 'a%a'.
-  case startsAndEndsWith(prefix, postfix) if 
!prefix.endsWith(escapeStr) =>
+  case startsAndEndsWith(prefix, postfix) =>
 And(GreaterThanOrEqual(Length(input), Literal(prefix.length + 
postfix.length)),
   And(StartsWith(input, Literal(prefix)), EndsWith(input, 
Literal(postfix
-  case contains(infix) if !infix.endsWith(escapeStr) =>
+  case contains(infix) =>
 Contains(input, Literal(infix))
   case equalTo(str) =>
 EqualTo(input, Literal(str))
-  case _ => Like(input, Literal.create(pattern, StringType), 
escapeChar)
+  case _ => l
 }
   }
   }
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
index 436f62e..1812dce 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LikeSimplificationSuite.scala
@@ -116,4 +116,52 @@ class LikeSimplificationSuite extends PlanTest {
 val optimized2 = Optimize.execute(originalQuery2.analyze)
 comparePlans(optimized2, originalQuery2.analyze)
   }
+
+  test("SPARK-33677: LikeSimplification should be skipped if pattern

[spark] branch branch-3.1 updated: [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode

[spark] branch master updated: [SPARK-33669] Wrong error message from YARN application state monitor when sc.stop in yarn client mode

[spark] branch branch-3.1 updated: [SPARK-33641][SQL][DOC][FOLLOW-UP] Add migration guide for CHAR VARCHAR types

[spark] branch master updated (29fed23 -> c88edda)

[spark] branch master updated (f021f6d -> 29fed23)

[spark] branch branch-3.1 updated: [MINOR][ML] Increase Bounded MLOR (without regularization) test error tolerance

[spark] branch master updated (3ac70f1 -> f021f6d)

[spark] branch master updated (6fd2345 -> 3ac70f1)

[spark] branch branch-3.0 updated: [SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

[spark] branch branch-3.1 updated: [SPARK-32110][SQL] normalize special floating numbers in HyperLogLog++

[spark] branch master updated (c001dd4 -> 6fd2345)

[spark] branch master updated (c05ee06 -> a093d6f)

[spark] branch master updated: [SPARK-33685][SQL] Migrate DROP VIEW command to use UnresolvedView to resolve the identifier

[spark] branch master updated (99613cd -> 2b30dde)

[spark] branch branch-3.0 updated: [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

[spark] branch master updated (031c5ef -> 99613cd)

[spark] branch branch-3.1 updated: [SPARK-33677][SQL] Skip LikeSimplification rule if pattern contains any escapeChar

17 matches

Site Navigation

Mail list logo

Footer information