svn commit: r32116 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_23_23_03-63fa6f5-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Thu Jan 24 07:18:16 2019
New Revision: 32116

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_23_23_03-63fa6f5 docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r32113 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_23_20_51-d5a97c1-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Thu Jan 24 05:03:45 2019
New Revision: 32113

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_23_20_51-d5a97c1 docs


[This commit notification would consist of 1781 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Hadoop.

2019-01-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d5a97c1  [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber 
for Hadoop.
d5a97c1 is described below

commit d5a97c1c2c86ae335e91008fa25b3359c4560915
Author: Ryan Blue 
AuthorDate: Thu Jan 24 12:45:25 2019 +0800

[SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Hadoop.

## What changes were proposed in this pull request?

Updates the attempt ID used by FileFormatWriter. Tasks in stage attempts 
use the same task attempt number and could conflict. Using Spark's task attempt 
ID guarantees that Hadoop TaskAttemptID instances are unique.

## How was this patch tested?

Existing tests. Also validated that we no longer detect this failure case 
in our logs after deployment.

Closes #23608 from rdblue/SPARK-26682-fix-hadoop-task-attempt-id.

Authored-by: Ryan Blue 
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/execution/datasources/FileFormatWriter.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
index 260ad97..91e92d8 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
@@ -170,7 +170,7 @@ object FileFormatWriter extends Logging {
 description = description,
 sparkStageId = taskContext.stageId(),
 sparkPartitionId = taskContext.partitionId(),
-sparkAttemptNumber = taskContext.attemptNumber(),
+sparkAttemptNumber = taskContext.taskAttemptId().toInt & 
Integer.MAX_VALUE,
 committer,
 iterator = iter)
 },


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.4 updated: [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Hadoop.

2019-01-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 63fa6f5  [SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber 
for Hadoop.
63fa6f5 is described below

commit 63fa6f5abc0c529d017243a4eea505c1c4cbbbd4
Author: Ryan Blue 
AuthorDate: Thu Jan 24 12:45:25 2019 +0800

[SPARK-26682][SQL] Use taskAttemptID instead of attemptNumber for Hadoop.

## What changes were proposed in this pull request?

Updates the attempt ID used by FileFormatWriter. Tasks in stage attempts 
use the same task attempt number and could conflict. Using Spark's task attempt 
ID guarantees that Hadoop TaskAttemptID instances are unique.

## How was this patch tested?

Existing tests. Also validated that we no longer detect this failure case 
in our logs after deployment.

Closes #23608 from rdblue/SPARK-26682-fix-hadoop-task-attempt-id.

Authored-by: Ryan Blue 
Signed-off-by: Wenchen Fan 
(cherry picked from commit d5a97c1c2c86ae335e91008fa25b3359c4560915)
Signed-off-by: Wenchen Fan 
---
 .../org/apache/spark/sql/execution/datasources/FileFormatWriter.scala   | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
index 774fe38..2103a2d 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala
@@ -170,7 +170,7 @@ object FileFormatWriter extends Logging {
 description = description,
 sparkStageId = taskContext.stageId(),
 sparkPartitionId = taskContext.partitionId(),
-sparkAttemptNumber = taskContext.attemptNumber(),
+sparkAttemptNumber = taskContext.taskAttemptId().toInt & 
Integer.MAX_VALUE,
 committer,
 iterator = iter)
 },


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r32112 - in /dev/spark/2.4.1-SNAPSHOT-2019_01_23_18_40-921c22b-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Thu Jan 24 02:56:27 2019
New Revision: 32112

Log:
Apache Spark 2.4.1-SNAPSHOT-2019_01_23_18_40-921c22b docs


[This commit notification would consist of 1476 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r32111 - in /dev/spark/2.3.4-SNAPSHOT-2019_01_23_18_40-23e35d4-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Thu Jan 24 02:54:33 2019
New Revision: 32111

Log:
Apache Spark 2.3.4-SNAPSHOT-2019_01_23_18_40-23e35d4 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26617][SQL] Cache manager locks

2019-01-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d0e9219  [SPARK-26617][SQL] Cache manager locks
d0e9219 is described below

commit d0e9219e03373b0db918d51bafe6a97775e2c65f
Author: Dave DeCaprio 
AuthorDate: Thu Jan 24 10:48:48 2019 +0800

[SPARK-26617][SQL] Cache manager locks

## What changes were proposed in this pull request?

Fixed several places in CacheManager where a write lock was being held 
while running the query optimizer.  This could cause a very lock block if the 
query optimization takes a long time.  This builds on changes from 
[SPARK-26548] that fixed this issue for one specific case in the CacheManager.

gatorsmile This is very similar to the PR you approved last week.

## How was this patch tested?

Has been tested on a live system where the blocking was causing major 
issues and it is working well.
CacheManager has no explicit unit test but is used in many places 
internally as part of the SharedState.

Closes #23539 from DaveDeCaprio/cache-manager-locks.

Lead-authored-by: Dave DeCaprio 
Co-authored-by: David DeCaprio 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/execution/CacheManager.scala  | 69 ++
 1 file changed, 43 insertions(+), 26 deletions(-)

diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
index 728fde5..0e6627c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CacheManager.scala
@@ -20,6 +20,7 @@ package org.apache.spark.sql.execution
 import java.util.concurrent.locks.ReentrantReadWriteLock
 
 import scala.collection.JavaConverters._
+import scala.collection.mutable
 
 import org.apache.hadoop.fs.{FileSystem, Path}
 
@@ -120,7 +121,7 @@ class CacheManager extends Logging {
   def uncacheQuery(
   query: Dataset[_],
   cascade: Boolean,
-  blocking: Boolean = true): Unit = writeLock {
+  blocking: Boolean = true): Unit = {
 uncacheQuery(query.sparkSession, query.logicalPlan, cascade, blocking)
   }
 
@@ -136,21 +137,27 @@ class CacheManager extends Logging {
   spark: SparkSession,
   plan: LogicalPlan,
   cascade: Boolean,
-  blocking: Boolean): Unit = writeLock {
+  blocking: Boolean): Unit = {
 val shouldRemove: LogicalPlan => Boolean =
   if (cascade) {
 _.find(_.sameResult(plan)).isDefined
   } else {
 _.sameResult(plan)
   }
-val it = cachedData.iterator()
-while (it.hasNext) {
-  val cd = it.next()
-  if (shouldRemove(cd.plan)) {
-cd.cachedRepresentation.cacheBuilder.clearCache(blocking)
-it.remove()
+val plansToUncache = mutable.Buffer[CachedData]()
+writeLock {
+  val it = cachedData.iterator()
+  while (it.hasNext) {
+val cd = it.next()
+if (shouldRemove(cd.plan)) {
+  plansToUncache += cd
+  it.remove()
+}
   }
 }
+plansToUncache.foreach { cd =>
+  cd.cachedRepresentation.cacheBuilder.clearCache(blocking)
+}
 // Re-compile dependent cached queries after removing the cached query.
 if (!cascade) {
   recacheByCondition(spark, _.find(_.sameResult(plan)).isDefined, 
clearCache = false)
@@ -160,7 +167,7 @@ class CacheManager extends Logging {
   /**
* Tries to re-cache all the cache entries that refer to the given plan.
*/
-  def recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit = writeLock {
+  def recacheByPlan(spark: SparkSession, plan: LogicalPlan): Unit = {
 recacheByCondition(spark, _.find(_.sameResult(plan)).isDefined)
   }
 
@@ -168,26 +175,36 @@ class CacheManager extends Logging {
   spark: SparkSession,
   condition: LogicalPlan => Boolean,
   clearCache: Boolean = true): Unit = {
-val it = cachedData.iterator()
 val needToRecache = scala.collection.mutable.ArrayBuffer.empty[CachedData]
-while (it.hasNext) {
-  val cd = it.next()
-  if (condition(cd.plan)) {
-if (clearCache) {
-  cd.cachedRepresentation.cacheBuilder.clearCache()
+writeLock {
+  val it = cachedData.iterator()
+  while (it.hasNext) {
+val cd = it.next()
+if (condition(cd.plan)) {
+  needToRecache += cd
+  // Remove the cache entry before we create a new one, so that we can 
have a different
+  // physical plan.
+  it.remove()
+}
+  }
+}
+needToRecache.map { cd =>
+  if (clearCache) {
+cd.cachedRepresentation.cacheBuilder.clearCache()
+  }
+  val plan = spark.sessionState.executePlan(cd.plan).executedPlan
+  val newCache = 

[spark] branch master updated: [SPARK-25713][SQL] implementing copy for ColumnArray

2019-01-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 11be22b  [SPARK-25713][SQL] implementing copy for ColumnArray
11be22b is described below

commit 11be22bb5e4784578c2f9ec1b80b30b2cf0ac3c7
Author: ayudovin 
AuthorDate: Thu Jan 24 10:35:44 2019 +0800

[SPARK-25713][SQL] implementing copy for ColumnArray

## What changes were proposed in this pull request?

Implement copy() for ColumnarArray

## How was this patch tested?
 Updating test case to existing tests in ColumnVectorSuite

Closes #23569 from ayudovin/copy-for-columnArray.

Authored-by: ayudovin 
Signed-off-by: Wenchen Fan 
---
 .../apache/spark/sql/vectorized/ColumnarArray.java | 22 +-
 .../execution/vectorized/ColumnVectorSuite.scala   | 18 ++
 2 files changed, 39 insertions(+), 1 deletion(-)

diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java 
b/sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
index dd2bd78..1471627 100644
--- a/sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
+++ b/sql/core/src/main/java/org/apache/spark/sql/vectorized/ColumnarArray.java
@@ -17,7 +17,9 @@
 package org.apache.spark.sql.vectorized;
 
 import org.apache.spark.annotation.Evolving;
+import org.apache.spark.sql.catalyst.expressions.UnsafeArrayData;
 import org.apache.spark.sql.catalyst.util.ArrayData;
+import org.apache.spark.sql.catalyst.util.GenericArrayData;
 import org.apache.spark.sql.types.*;
 import org.apache.spark.unsafe.types.CalendarInterval;
 import org.apache.spark.unsafe.types.UTF8String;
@@ -46,7 +48,25 @@ public final class ColumnarArray extends ArrayData {
 
   @Override
   public ArrayData copy() {
-throw new UnsupportedOperationException();
+DataType dt = data.dataType();
+
+if (dt instanceof BooleanType) {
+  return UnsafeArrayData.fromPrimitiveArray(toBooleanArray());
+} else if (dt instanceof ByteType) {
+  return UnsafeArrayData.fromPrimitiveArray(toByteArray());
+} else if (dt instanceof ShortType) {
+  return UnsafeArrayData.fromPrimitiveArray(toShortArray());
+} else if (dt instanceof IntegerType) {
+  return UnsafeArrayData.fromPrimitiveArray(toIntArray());
+} else if (dt instanceof LongType) {
+  return UnsafeArrayData.fromPrimitiveArray(toLongArray());
+} else if (dt instanceof FloatType) {
+  return UnsafeArrayData.fromPrimitiveArray(toFloatArray());
+} else if (dt instanceof DoubleType) {
+  return UnsafeArrayData.fromPrimitiveArray(toDoubleArray());
+} else {
+  return new GenericArrayData(toObjectArray(dt));
+}
   }
 
   @Override
diff --git 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
index 2d1ad4b..866fcb1 100644
--- 
a/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
+++ 
b/sql/core/src/test/scala/org/apache/spark/sql/execution/vectorized/ColumnVectorSuite.scala
@@ -58,9 +58,11 @@ class ColumnVectorSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 }
 
 val array = new ColumnarArray(testVector, 0, 10)
+val arrayCopy = array.copy()
 
 (0 until 10).foreach { i =>
   assert(array.get(i, BooleanType) === (i % 2 == 0))
+  assert(arrayCopy.get(i, BooleanType) === (i % 2 == 0))
 }
   }
 
@@ -70,9 +72,11 @@ class ColumnVectorSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 }
 
 val array = new ColumnarArray(testVector, 0, 10)
+val arrayCopy = array.copy()
 
 (0 until 10).foreach { i =>
   assert(array.get(i, ByteType) === i.toByte)
+  assert(arrayCopy.get(i, ByteType) === i.toByte)
 }
   }
 
@@ -82,9 +86,11 @@ class ColumnVectorSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 }
 
 val array = new ColumnarArray(testVector, 0, 10)
+val arrayCopy = array.copy()
 
 (0 until 10).foreach { i =>
   assert(array.get(i, ShortType) === i.toShort)
+  assert(arrayCopy.get(i, ShortType) === i.toShort)
 }
   }
 
@@ -94,9 +100,11 @@ class ColumnVectorSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 }
 
 val array = new ColumnarArray(testVector, 0, 10)
+val arrayCopy = array.copy()
 
 (0 until 10).foreach { i =>
   assert(array.get(i, IntegerType) === i)
+  assert(arrayCopy.get(i, IntegerType) === i)
 }
   }
 
@@ -106,9 +114,11 @@ class ColumnVectorSuite extends SparkFunSuite with 
BeforeAndAfterEach {
 }
 
 val array = new ColumnarArray(testVector, 0, 10)
+val arrayCopy = array.copy()
 
 (0 until 10).foreach { i =>
   assert(array.get(i, LongType) === i)
+  

[spark] branch branch-2.3 updated: [SPARK-26706][SQL][FOLLOWUP] Fix `illegalNumericPrecedence` for ByteType

2019-01-23 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new 23e35d4  [SPARK-26706][SQL][FOLLOWUP] Fix `illegalNumericPrecedence` 
for ByteType
23e35d4 is described below

commit 23e35d4f6cd75d3cc2a8054f8872fdf61c1a8bcc
Author: DB Tsai 
AuthorDate: Wed Jan 23 17:43:43 2019 -0800

[SPARK-26706][SQL][FOLLOWUP] Fix `illegalNumericPrecedence` for ByteType

Removed automatically generated tests in

```
test("canSafeCast and mayTruncate must be consistent for numeric types")
```

since `canSafeCast` doesn't exit in 2.3 branch. We have enough test 
coverages
in the explict casting tests.

Authored-by: DB Tsai 
Signed-off-by: DB Tsai 
---
 .../spark/sql/catalyst/expressions/CastSuite.scala | 24 --
 1 file changed, 24 deletions(-)

diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 777295c..079fc6b 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -934,28 +934,4 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(Cast.mayTruncate(DoubleType, ByteType))
 assert(Cast.mayTruncate(DecimalType.IntDecimal, ByteType))
   }
-
-  test("canSafeCast and mayTruncate must be consistent for numeric types") {
-import DataTypeTestUtils._
-
-def isCastSafe(from: NumericType, to: NumericType): Boolean = (from, to) 
match {
-  case (_, dt: DecimalType) => dt.isWiderThan(from)
-  case (dt: DecimalType, _) => dt.isTighterThan(to)
-  case _ => numericPrecedence.indexOf(from) <= 
numericPrecedence.indexOf(to)
-}
-
-numericTypes.foreach { from =>
-  val (safeTargetTypes, unsafeTargetTypes) = numericTypes.partition(to => 
isCastSafe(from, to))
-
-  safeTargetTypes.foreach { to =>
-assert(Cast.canSafeCast(from, to), s"It should be possible to safely 
cast $from to $to")
-assert(!Cast.mayTruncate(from, to), s"No truncation is expected when 
casting $from to $to")
-  }
-
-  unsafeTargetTypes.foreach { to =>
-assert(!Cast.canSafeCast(from, to), s"It shouldn't be possible to 
safely cast $from to $to")
-assert(Cast.mayTruncate(from, to), s"Truncation is expected when 
casting $from to $to")
-  }
-}
-  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r32108 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_23_16_28-0df29bf-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Thu Jan 24 00:41:10 2019
New Revision: 32108

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_23_16_28-0df29bf docs


[This commit notification would consist of 1781 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch branch-2.3 updated: [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

2019-01-23 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
 new de3b5c4  [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType
de3b5c4 is described below

commit de3b5c459869e9ff0979e579828e822e6b01f0e3
Author: Anton Okolnychyi 
AuthorDate: Thu Jan 24 00:12:26 2019 +

[SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

This PR contains a minor change in `Cast$mayTruncate` that fixes its logic 
for bytes.

Right now, `mayTruncate(ByteType, LongType)` returns `false` while 
`mayTruncate(ShortType, LongType)` returns `true`. Consequently, 
`spark.range(1, 3).as[Byte]` and `spark.range(1, 3).as[Short]` behave 
differently.

Potentially, this bug can silently corrupt someone's data.
```scala
// executes silently even though Long is converted into Byte
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
  .map(b => b - 1)
  .show()
+-+
|value|
+-+
|  -12|
|  -11|
|  -10|
|   -9|
|   -8|
|   -7|
|   -6|
|   -5|
|   -4|
|   -3|
+-+
// throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
as it may truncate
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
  .map(s => s - 1)
  .show()
```

This PR comes with a set of unit tests.

Closes #23632 from aokolnychyi/cast-fix.

Authored-by: Anton Okolnychyi 
Signed-off-by: DB Tsai 
---
 .../spark/sql/catalyst/expressions/Cast.scala  |  2 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 36 ++
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  9 ++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index 79b0516..5a156c5 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -130,7 +130,7 @@ object Cast {
   private def illegalNumericPrecedence(from: DataType, to: DataType): Boolean 
= {
 val fromPrecedence = TypeCoercion.numericPrecedence.indexOf(from)
 val toPrecedence = TypeCoercion.numericPrecedence.indexOf(to)
-toPrecedence > 0 && fromPrecedence > toPrecedence
+toPrecedence >= 0 && fromPrecedence > toPrecedence
   }
 
   def forceNullable(from: DataType, to: DataType): Boolean = (from, to) match {
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 5b25bdf..777295c 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -23,6 +23,7 @@ import java.util.{Calendar, Locale, TimeZone}
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCoercion.numericPrecedence
 import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -922,4 +923,39 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val ret6 = cast(Literal.create((1, Map(1 -> "a", 2 -> "b", 3 -> "c"))), 
StringType)
 checkEvaluation(ret6, "[1, [1 -> a, 2 -> b, 3 -> c]]")
   }
+
+  test("SPARK-26706: Fix Cast.mayTruncate for bytes") {
+assert(!Cast.mayTruncate(ByteType, ByteType))
+assert(!Cast.mayTruncate(DecimalType.ByteDecimal, ByteType))
+assert(Cast.mayTruncate(ShortType, ByteType))
+assert(Cast.mayTruncate(IntegerType, ByteType))
+assert(Cast.mayTruncate(LongType, ByteType))
+assert(Cast.mayTruncate(FloatType, ByteType))
+assert(Cast.mayTruncate(DoubleType, ByteType))
+assert(Cast.mayTruncate(DecimalType.IntDecimal, ByteType))
+  }
+
+  test("canSafeCast and mayTruncate must be consistent for numeric types") {
+import DataTypeTestUtils._
+
+def isCastSafe(from: NumericType, to: NumericType): Boolean = (from, to) 
match {
+  case (_, dt: DecimalType) => dt.isWiderThan(from)
+  case (dt: DecimalType, _) => dt.isTighterThan(to)
+  case _ => numericPrecedence.indexOf(from) <= 
numericPrecedence.indexOf(to)
+}
+
+numericTypes.foreach { from =>
+  val (safeTargetTypes, unsafeTargetTypes) = numericTypes.partition(to => 
isCastSafe(from, to))
+
+  safeTargetTypes.foreach { to =>
+assert(Cast.canSafeCast(from, to), s"It 

[spark] branch branch-2.4 updated: [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

2019-01-23 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
 new 921c22b  [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType
921c22b is described below

commit 921c22b1fffc4844aa05c201ba15986be34a3782
Author: Anton Okolnychyi 
AuthorDate: Thu Jan 24 00:12:26 2019 +

[SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

This PR contains a minor change in `Cast$mayTruncate` that fixes its logic 
for bytes.

Right now, `mayTruncate(ByteType, LongType)` returns `false` while 
`mayTruncate(ShortType, LongType)` returns `true`. Consequently, 
`spark.range(1, 3).as[Byte]` and `spark.range(1, 3).as[Short]` behave 
differently.

Potentially, this bug can silently corrupt someone's data.
```scala
// executes silently even though Long is converted into Byte
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
  .map(b => b - 1)
  .show()
+-+
|value|
+-+
|  -12|
|  -11|
|  -10|
|   -9|
|   -8|
|   -7|
|   -6|
|   -5|
|   -4|
|   -3|
+-+
// throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
as it may truncate
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
  .map(s => s - 1)
  .show()
```

This PR comes with a set of unit tests.

Closes #23632 from aokolnychyi/cast-fix.

Authored-by: Anton Okolnychyi 
Signed-off-by: DB Tsai 
---
 .../spark/sql/catalyst/expressions/Cast.scala  |  2 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 36 ++
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  9 ++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index ee463bf..ac02dac 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -131,7 +131,7 @@ object Cast {
   private def illegalNumericPrecedence(from: DataType, to: DataType): Boolean 
= {
 val fromPrecedence = TypeCoercion.numericPrecedence.indexOf(from)
 val toPrecedence = TypeCoercion.numericPrecedence.indexOf(to)
-toPrecedence > 0 && fromPrecedence > toPrecedence
+toPrecedence >= 0 && fromPrecedence > toPrecedence
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index d9f32c0..b1531ba 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -23,6 +23,7 @@ import java.util.{Calendar, Locale, TimeZone}
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCoercion.numericPrecedence
 import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -953,4 +954,39 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val ret6 = cast(Literal.create((1, Map(1 -> "a", 2 -> "b", 3 -> "c"))), 
StringType)
 checkEvaluation(ret6, "[1, [1 -> a, 2 -> b, 3 -> c]]")
   }
+
+  test("SPARK-26706: Fix Cast.mayTruncate for bytes") {
+assert(!Cast.mayTruncate(ByteType, ByteType))
+assert(!Cast.mayTruncate(DecimalType.ByteDecimal, ByteType))
+assert(Cast.mayTruncate(ShortType, ByteType))
+assert(Cast.mayTruncate(IntegerType, ByteType))
+assert(Cast.mayTruncate(LongType, ByteType))
+assert(Cast.mayTruncate(FloatType, ByteType))
+assert(Cast.mayTruncate(DoubleType, ByteType))
+assert(Cast.mayTruncate(DecimalType.IntDecimal, ByteType))
+  }
+
+  test("canSafeCast and mayTruncate must be consistent for numeric types") {
+import DataTypeTestUtils._
+
+def isCastSafe(from: NumericType, to: NumericType): Boolean = (from, to) 
match {
+  case (_, dt: DecimalType) => dt.isWiderThan(from)
+  case (dt: DecimalType, _) => dt.isTighterThan(to)
+  case _ => numericPrecedence.indexOf(from) <= 
numericPrecedence.indexOf(to)
+}
+
+numericTypes.foreach { from =>
+  val (safeTargetTypes, unsafeTargetTypes) = numericTypes.partition(to => 
isCastSafe(from, to))
+
+  safeTargetTypes.foreach { to =>
+assert(Cast.canSafeCast(from, to), s"It should be possible to safely 
cast $from to $to")
+

[spark] branch master updated: [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

2019-01-23 Thread dbtsai
This is an automated email from the ASF dual-hosted git repository.

dbtsai pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0df29bf  [SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType
0df29bf is described below

commit 0df29bfbdc6ec7f3ba7d7b54e024059b3894da16
Author: Anton Okolnychyi 
AuthorDate: Thu Jan 24 00:12:26 2019 +

[SPARK-26706][SQL] Fix `illegalNumericPrecedence` for ByteType

## What changes were proposed in this pull request?

This PR contains a minor change in `Cast$mayTruncate` that fixes its logic 
for bytes.

Right now, `mayTruncate(ByteType, LongType)` returns `false` while 
`mayTruncate(ShortType, LongType)` returns `true`. Consequently, 
`spark.range(1, 3).as[Byte]` and `spark.range(1, 3).as[Short]` behave 
differently.

Potentially, this bug can silently corrupt someone's data.
```scala
// executes silently even though Long is converted into Byte
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Byte]
  .map(b => b - 1)
  .show()
+-+
|value|
+-+
|  -12|
|  -11|
|  -10|
|   -9|
|   -8|
|   -7|
|   -6|
|   -5|
|   -4|
|   -3|
+-+
// throws an AnalysisException: Cannot up cast `id` from bigint to smallint 
as it may truncate
spark.range(Long.MaxValue - 10, Long.MaxValue).as[Short]
  .map(s => s - 1)
  .show()
```
## How was this patch tested?

This PR comes with a set of unit tests.

Closes #23632 from aokolnychyi/cast-fix.

Authored-by: Anton Okolnychyi 
Signed-off-by: DB Tsai 
---
 .../spark/sql/catalyst/expressions/Cast.scala  |  2 +-
 .../spark/sql/catalyst/expressions/CastSuite.scala | 36 ++
 .../scala/org/apache/spark/sql/DatasetSuite.scala  |  9 ++
 3 files changed, 46 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
index ff6a68b..a6926d8 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Cast.scala
@@ -131,7 +131,7 @@ object Cast {
   private def illegalNumericPrecedence(from: DataType, to: DataType): Boolean 
= {
 val fromPrecedence = TypeCoercion.numericPrecedence.indexOf(from)
 val toPrecedence = TypeCoercion.numericPrecedence.indexOf(to)
-toPrecedence > 0 && fromPrecedence > toPrecedence
+toPrecedence >= 0 && fromPrecedence > toPrecedence
   }
 
   /**
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
index 94dee7e..11956e1 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CastSuite.scala
@@ -25,6 +25,7 @@ import scala.util.Random
 import org.apache.spark.SparkFunSuite
 import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCoercion.numericPrecedence
 import org.apache.spark.sql.catalyst.expressions.codegen.CodegenContext
 import org.apache.spark.sql.catalyst.util.DateTimeTestUtils._
 import org.apache.spark.sql.catalyst.util.DateTimeUtils
@@ -955,4 +956,39 @@ class CastSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 val ret6 = cast(Literal.create((1, Map(1 -> "a", 2 -> "b", 3 -> "c"))), 
StringType)
 checkEvaluation(ret6, "[1, [1 -> a, 2 -> b, 3 -> c]]")
   }
+
+  test("SPARK-26706: Fix Cast.mayTruncate for bytes") {
+assert(!Cast.mayTruncate(ByteType, ByteType))
+assert(!Cast.mayTruncate(DecimalType.ByteDecimal, ByteType))
+assert(Cast.mayTruncate(ShortType, ByteType))
+assert(Cast.mayTruncate(IntegerType, ByteType))
+assert(Cast.mayTruncate(LongType, ByteType))
+assert(Cast.mayTruncate(FloatType, ByteType))
+assert(Cast.mayTruncate(DoubleType, ByteType))
+assert(Cast.mayTruncate(DecimalType.IntDecimal, ByteType))
+  }
+
+  test("canSafeCast and mayTruncate must be consistent for numeric types") {
+import DataTypeTestUtils._
+
+def isCastSafe(from: NumericType, to: NumericType): Boolean = (from, to) 
match {
+  case (_, dt: DecimalType) => dt.isWiderThan(from)
+  case (dt: DecimalType, _) => dt.isTighterThan(to)
+  case _ => numericPrecedence.indexOf(from) <= 
numericPrecedence.indexOf(to)
+}
+
+numericTypes.foreach { from =>
+  val (safeTargetTypes, unsafeTargetTypes) = numericTypes.partition(to => 
isCastSafe(from, to))
+
+  safeTargetTypes.foreach { to =>
+assert(Cast.canSafeCast(from, to), s"It should be 

svn commit: r32105 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_23_07_25-0446363-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Wed Jan 23 15:37:51 2019
New Revision: 32105

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_23_07_25-0446363 docs


[This commit notification would consist of 1781 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26660] Add warning logs when broadcasting large task binary

2019-01-23 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 0446363  [SPARK-26660] Add warning logs when broadcasting large task 
binary
0446363 is described below

commit 0446363ef466922a42de019ca14fada62959c1f3
Author: Liupengcheng 
AuthorDate: Wed Jan 23 08:51:39 2019 -0600

[SPARK-26660] Add warning logs when broadcasting large task binary

## What changes were proposed in this pull request?

Currently, some ML library may generate large ml model, which may be 
referenced in the task closure, so driver will broadcasting large task binary, 
and executor may not able to deserialize it and result in OOM failures(for 
instance, executor's memory is not enough). This problem not only affects apps 
using ml library, some user specified closure or function which refers large 
data may also have this problem.

In order to facilitate the debuging of memory problem caused by large 
taskBinary broadcast, we can add same warning logs for it.

This PR will add some warning logs on the driver side when broadcasting a 
large task binary, and it also included some minor log changes in the reading 
of broadcast.

## How was this patch tested?
NA-Just log changes.

Please review http://spark.apache.org/contributing.html before opening a 
pull request.

Closes #23580 from liupc/Add-warning-logs-for-large-taskBinary-size.

Authored-by: Liupengcheng 
Signed-off-by: Sean Owen 
---
 core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala | 4 +++-
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git 
a/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala 
b/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
index 6410866..7680587 100644
--- a/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
+++ b/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala
@@ -236,7 +236,9 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, 
id: Long)
   throw new SparkException(s"Failed to get locally stored 
broadcast data: $broadcastId")
 }
   case None =>
-logInfo("Started reading broadcast variable " + id)
+val estimatedTotalSize = Utils.bytesToString(numBlocks * blockSize)
+logInfo(s"Started reading broadcast variable $id with $numBlocks 
pieces " +
+  s"(estimated total size $estimatedTotalSize)")
 val startTimeMs = System.currentTimeMillis()
 val blocks = readBlocks()
 logInfo("Reading broadcast variable " + id + " took" + 
Utils.getUsedTimeMs(startTimeMs))
diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index f6ade18..ecb8ac0 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -1162,6 +1162,10 @@ private[spark] class DAGScheduler(
 partitions = stage.rdd.partitions
   }
 
+  if (taskBinaryBytes.length * 1000 > TaskSetManager.TASK_SIZE_TO_WARN_KB) 
{
+logWarning(s"Broadcasting large task binary with size " +
+  s"${Utils.bytesToString(taskBinaryBytes.length)}")
+  }
   taskBinary = sc.broadcast(taskBinaryBytes)
 } catch {
   // In the case of a failure during serialization, abort the stage.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26681][SQL] Support Ammonite inner-class scopes.

2019-01-23 Thread srowen
This is an automated email from the ASF dual-hosted git repository.

srowen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new d008e23  [SPARK-26681][SQL] Support Ammonite inner-class scopes.
d008e23 is described below

commit d008e23ab50637ebe54eebe8784f5410f15486cc
Author: Ryan Blue 
AuthorDate: Wed Jan 23 08:50:03 2019 -0600

[SPARK-26681][SQL] Support Ammonite inner-class scopes.

## What changes were proposed in this pull request?

This adds a new pattern to recognize Ammonite REPL classes and return the 
correct scope.

## How was this patch tested?

Manually tested with Spark in an Ammonite session.

Closes #23607 from rdblue/SPARK-26681-support-ammonite-scopes.

Authored-by: Ryan Blue 
Signed-off-by: Sean Owen 
---
 .../org/apache/spark/sql/catalyst/encoders/OuterScopes.scala   | 10 ++
 1 file changed, 10 insertions(+)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
index a1f0312..665b2cd 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/OuterScopes.scala
@@ -53,6 +53,12 @@ object OuterScopes {
 val outer = outerScopes.get(outerClassName)
 if (outer == null) {
   outerClassName match {
+case AmmoniteREPLClass(cellClassName) =>
+  () => {
+val objClass = Utils.classForName(cellClassName)
+val objInstance = objClass.getField("MODULE$").get(null)
+objClass.getMethod("instance").invoke(objInstance)
+  }
 // If the outer class is generated by REPL, users don't need to 
register it as it has
 // only one instance and there is a way to retrieve it: get the 
`$read` object, call the
 // `INSTANCE()` method to get the single instance of class `$read`. 
Then call `$iw()`
@@ -95,4 +101,8 @@ object OuterScopes {
 
   // The format of REPL generated wrapper class's name, e.g. 
`$line12.$read$$iw$$iw`
   private[this] val REPLClass = """^(\$line(?:\d+)\.\$read)(?:\$\$iw)+$""".r
+
+  // The format of ammonite REPL generated wrapper class's name,
+  // e.g. `ammonite.$sess.cmd8$Helper$Foo` -> 
`ammonite.$sess.cmd8.instance.Foo`
+  private[this] val AmmoniteREPLClass = 
"""^(ammonite\.\$sess\.cmd(?:\d+)\$).*""".r
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



[spark] branch master updated: [SPARK-26653][SQL] Use Proleptic Gregorian calendar in parsing JDBC lower/upper bounds

2019-01-23 Thread wenchen
This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 46d5bb9  [SPARK-26653][SQL] Use Proleptic Gregorian calendar in 
parsing JDBC lower/upper bounds
46d5bb9 is described below

commit 46d5bb9a0fa4f44ea857e2cb15bf15acd773b839
Author: Maxim Gekk 
AuthorDate: Wed Jan 23 20:23:17 2019 +0800

[SPARK-26653][SQL] Use Proleptic Gregorian calendar in parsing JDBC 
lower/upper bounds

## What changes were proposed in this pull request?

In the PR, I propose using of the `stringToDate` and `stringToTimestamp` 
methods in parsing JDBC lower/upper bounds of the partition column if it has 
`DateType` or `TimestampType`. Since those methods have been ported on 
Proleptic Gregorian calendar by #23512, the PR switches parsing of JDBC bounds 
of the partition column on the calendar as well.

## How was this patch tested?

This was tested by `JDBCSuite`.

Closes #23597 from MaxGekk/jdbc-parse-timestamp-bounds.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Maxim Gekk 
Signed-off-by: Wenchen Fan 
---
 docs/sql-migration-guide-upgrade.md|  2 ++
 .../execution/datasources/jdbc/JDBCRelation.scala  | 27 -
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala  | 34 +-
 3 files changed, 54 insertions(+), 9 deletions(-)

diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index d442087..98fc9fa 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -93,6 +93,8 @@ displayTitle: Spark SQL Upgrading Guide
 
   - Since Spark 3.0, the `weekofyear`, `weekday` and `dayofweek` functions use 
java.time API for calculation week number of year and day number of week based 
on Proleptic Gregorian calendar. In Spark version 2.4 and earlier, the hybrid 
calendar (Julian + Gregorian) is used for the same purpose. Results of the 
functions returned by Spark 3.0 and previous versions can be different for 
dates before October 15, 1582 (Gregorian).
 
+  - Since Spark 3.0, the JDBC options `lowerBound` and `upperBound` are 
converted to TimestampType/DateType values in the same way as casting strings 
to TimestampType/DateType values. The conversion is based on Proleptic 
Gregorian calendar, and time zone defined by the SQL config 
`spark.sql.session.timeZone`. In Spark version 2.4 and earlier, the conversion 
is based on the hybrid calendar (Julian + Gregorian) and on default system time 
zone.
+
 ## Upgrading From Spark SQL 2.3 to 2.4
 
   - In Spark version 2.3 and earlier, the second parameter to array_contains 
function is implicitly promoted to the element type of first array type 
parameter. This type promotion can be lossy and may cause `array_contains` 
function to return wrong result. This problem has been addressed in 2.4 by 
employing a safer type promotion mechanism. This can cause some change in 
behavior and are illustrated in the table below.
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
index 13ed105..c0f78b5 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCRelation.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.sql.execution.datasources.jdbc
 
-import java.sql.{Date, Timestamp}
-
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.Partition
@@ -27,10 +25,12 @@ import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.{AnalysisException, DataFrame, Row, SaveMode, 
SparkSession, SQLContext}
 import org.apache.spark.sql.catalyst.analysis._
 import org.apache.spark.sql.catalyst.util.{DateFormatter, DateTimeUtils, 
TimestampFormatter}
+import org.apache.spark.sql.catalyst.util.DateTimeUtils.{getTimeZone, 
stringToDate, stringToTimestamp}
 import org.apache.spark.sql.internal.SQLConf
 import org.apache.spark.sql.jdbc.JdbcDialects
 import org.apache.spark.sql.sources._
 import org.apache.spark.sql.types.{DataType, DateType, NumericType, 
StructType, TimestampType}
+import org.apache.spark.unsafe.types.UTF8String
 
 /**
  * Instructions on how to partition the table among workers.
@@ -85,8 +85,8 @@ private[sql] object JDBCRelation extends Logging {
 val (column, columnType) = verifyAndGetNormalizedPartitionColumn(
   schema, partitionColumn.get, resolver, jdbcOptions)
 
-val lowerBoundValue = toInternalBoundValue(lowerBound.get, columnType)
-val upperBoundValue = toInternalBoundValue(upperBound.get, columnType)
+val lowerBoundValue = toInternalBoundValue(lowerBound.get, columnType, 
timeZoneId)
+val 

svn commit: r32102 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_23_02_49-1ed1b4d-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-23 Thread pwendell
Author: pwendell
Date: Wed Jan 23 11:01:13 2019
New Revision: 32102

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_23_02_49-1ed1b4d docs


[This commit notification would consist of 1781 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org