date:20190109

[spark] branch master updated: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

2019-01-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 73c7b12  [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
73c7b12 is described below

commit 73c7b126c6f477b38eba98232f2c8389a68676b8
Author: Maxim Gekk 
AuthorDate: Thu Jan 10 10:32:20 2019 +0800

[SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

## What changes were proposed in this pull request?

Added a cache for  java.time.format.DateTimeFormatter instances with keys 
consist of pattern and locale. This should allow to avoid parsing of 
timestamp/date patterns each time when new instance of 
`TimestampFormatter`/`DateFormatter` is created.

## How was this patch tested?

By existing test suites `TimestampFormatterSuite`/`DateFormatterSuite` and 
`JsonFunctionsSuite`/`JsonSuite`.

Closes #23462 from MaxGekk/time-formatter-caching.

Lead-authored-by: Maxim Gekk 
Co-authored-by: Maxim Gekk 
Signed-off-by: Hyukjin Kwon 
---
 .../spark/sql/catalyst/util/DateFormatter.scala|  2 +-
 .../catalyst/util/DateTimeFormatterHelper.scala| 51 --
 .../sql/catalyst/util/TimestampFormatter.scala |  2 +-
 3 files changed, 40 insertions(+), 15 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
index b4c9967..db92552 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateFormatter.scala
@@ -36,7 +36,7 @@ class Iso8601DateFormatter(
 locale: Locale) extends DateFormatter with DateTimeFormatterHelper {
 
   @transient
-  private lazy val formatter = buildFormatter(pattern, locale)
+  private lazy val formatter = getOrCreateFormatter(pattern, locale)
   private val UTC = ZoneId.of("UTC")
 
   private def toInstant(s: String): Instant = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
index 91cc57e..81ad6ad 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeFormatterHelper.scala
@@ -23,9 +23,46 @@ import java.time.format.{DateTimeFormatter, 
DateTimeFormatterBuilder, ResolverSt
 import java.time.temporal.{ChronoField, TemporalAccessor, TemporalQueries}
 import java.util.Locale
 
+import com.google.common.cache.CacheBuilder
+
+import org.apache.spark.sql.catalyst.util.DateTimeFormatterHelper._
+
 trait DateTimeFormatterHelper {
+  protected def toInstantWithZoneId(temporalAccessor: TemporalAccessor, 
zoneId: ZoneId): Instant = {
+val localTime = if (temporalAccessor.query(TemporalQueries.localTime) == 
null) {
+  LocalTime.ofNanoOfDay(0)
+} else {
+  LocalTime.from(temporalAccessor)
+}
+val localDate = LocalDate.from(temporalAccessor)
+val localDateTime = LocalDateTime.of(localDate, localTime)
+val zonedDateTime = ZonedDateTime.of(localDateTime, zoneId)
+Instant.from(zonedDateTime)
+  }
+
+  // Gets a formatter from the cache or creates new one. The buildFormatter 
method can be called
+  // a few times with the same parameters in parallel if the cache does not 
contain values
+  // associated to those parameters. Since the formatter is immutable, it does 
not matter.
+  // In this way, synchronised is intentionally omitted in this method to make 
parallel calls
+  // less synchronised.
+  // The Cache.get method is not used here to avoid creation of additional 
instances of Callable.
+  protected def getOrCreateFormatter(pattern: String, locale: Locale): 
DateTimeFormatter = {
+val key = (pattern, locale)
+var formatter = cache.getIfPresent(key)
+if (formatter == null) {
+  formatter = buildFormatter(pattern, locale)
+  cache.put(key, formatter)
+}
+formatter
+  }
+}
 
-  protected def buildFormatter(pattern: String, locale: Locale): 
DateTimeFormatter = {
+private object DateTimeFormatterHelper {
+  val cache = CacheBuilder.newBuilder()
+.maximumSize(128)
+.build[(String, Locale), DateTimeFormatter]()
+
+  def buildFormatter(pattern: String, locale: Locale): DateTimeFormatter = {
 new DateTimeFormatterBuilder()
   .parseCaseInsensitive()
   .appendPattern(pattern)
@@ -38,16 +75,4 @@ trait DateTimeFormatterHelper {
   .withChronology(IsoChronology.INSTANCE)
   .withResolverStyle(ResolverStyle.STRICT)
   }
-
-  protected def toInstantWithZoneId(temporalAccessor: TemporalAccessor, 
zoneId: ZoneId): Instant = {
-val

[spark] branch master updated: [SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two Consecutive Hints

2019-01-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 2d01bcc  [SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two 
Consecutive Hints
2d01bcc is described below

commit 2d01bccbd4c93bfbfa1a9e618fcb795b7106f01c
Author: maryannxue 
AuthorDate: Wed Jan 9 14:31:26 2019 -0800

[SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two Consecutive 
Hints

## What changes were proposed in this pull request?

This is to fix a bug in https://github.com/apache/spark/pull/23036, which 
would lead to an exception in case of two consecutive hints.

## How was this patch tested?

Added a new test.

Closes #23501 from maryannxue/query-hint-followup.

Authored-by: maryannxue 
Signed-off-by: gatorsmile 
---
 .../spark/sql/catalyst/optimizer/EliminateResolvedHint.scala | 2 +-
 sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala | 9 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
index bbe4eee..a136f04 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/EliminateResolvedHint.scala
@@ -34,7 +34,7 @@ object EliminateResolvedHint extends Rule[LogicalPlan] {
 val rightHint = mergeHints(collectHints(j.right))
 j.copy(hint = JoinHint(leftHint, rightHint))
 }
-pulledUp.transform {
+pulledUp.transformUp {
   case h: ResolvedHint => h.child
 }
   }
diff --git a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala 
b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
index 3652895..55f210c 100644
--- a/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
+++ b/sql/core/src/test/scala/org/apache/spark/sql/JoinHintSuite.scala
@@ -190,4 +190,13 @@ class JoinHintSuite extends PlanTest with SharedSQLContext 
{
 Some(HintInfo(broadcast = true))) :: Nil
 )
   }
+
+  test("nested hint") {
+verifyJoinHint(
+  df.hint("broadcast").hint("broadcast").filter('id > 2).join(df, "id"),
+  JoinHint(
+Some(HintInfo(broadcast = true)),
+None) :: Nil
+)
+  }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r31856 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_18_10-2d01bcc-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-09 Thread pwendell

Author: pwendell
Date: Thu Jan 10 02:23:09 2019
New Revision: 31856

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_09_18_10-2d01bcc docs


[This commit notification would consist of 1775 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

2019-01-09 Thread gurwls223

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 1a47233  [SPARK-26493][SQL] Allow multiple spark.sql.extensions
1a47233 is described below

commit 1a47233f998cd26bac06fa5529a1755a3758d198
Author: Jamison Bennett 
AuthorDate: Thu Jan 10 10:23:03 2019 +0800

[SPARK-26493][SQL] Allow multiple spark.sql.extensions

## What changes were proposed in this pull request?

Allow multiple spark.sql.extensions to be specified in the
configuration.

## How was this patch tested?

New tests are added.

Closes #23398 from jamisonbennett/SPARK-26493.

Authored-by: Jamison Bennett 
Signed-off-by: Hyukjin Kwon 
---
 .../sql/catalyst/analysis/FunctionRegistry.scala   |  11 +-
 .../apache/spark/sql/internal/StaticSQLConf.scala  |  10 +-
 .../scala/org/apache/spark/sql/SparkSession.scala  |  11 +-
 .../apache/spark/sql/SparkSessionExtensions.scala  |  24 ++-
 .../spark/sql/SparkSessionExtensionSuite.scala | 167 +++--
 5 files changed, 198 insertions(+), 25 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index c79f990..befc02f 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -25,6 +25,7 @@ import scala.language.existentials
 import scala.reflect.ClassTag
 import scala.util.{Failure, Success, Try}
 
+import org.apache.spark.internal.Logging
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst.FunctionIdentifier
 import org.apache.spark.sql.catalyst.analysis.FunctionRegistry.FunctionBuilder
@@ -87,7 +88,7 @@ trait FunctionRegistry {
   override def clone(): FunctionRegistry = throw new 
CloneNotSupportedException()
 }
 
-class SimpleFunctionRegistry extends FunctionRegistry {
+class SimpleFunctionRegistry extends FunctionRegistry with Logging {
 
   @GuardedBy("this")
   private val functionBuilders =
@@ -103,7 +104,13 @@ class SimpleFunctionRegistry extends FunctionRegistry {
   name: FunctionIdentifier,
   info: ExpressionInfo,
   builder: FunctionBuilder): Unit = synchronized {
-functionBuilders.put(normalizeFuncName(name), (info, builder))
+val normalizedName = normalizeFuncName(name)
+val newFunction = (info, builder)
+functionBuilders.put(normalizedName, newFunction) match {
+  case Some(previousFunction) if previousFunction != newFunction =>
+logWarning(s"The function $normalizedName replaced a previously 
registered function.")
+  case _ =>
+}
   }
 
   override def lookupFunction(name: FunctionIdentifier, children: 
Seq[Expression]): Expression = {
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
index d9c354b..0a8dc28 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/StaticSQLConf.scala
@@ -99,9 +99,15 @@ object StaticSQLConf {
   .createWithDefault(false)
 
   val SPARK_SESSION_EXTENSIONS = buildStaticConf("spark.sql.extensions")
-.doc("Name of the class used to configure Spark Session extensions. The 
class should " +
-  "implement Function1[SparkSessionExtension, Unit], and must have a 
no-args constructor.")
+.doc("A comma-separated list of classes that implement " +
+  "Function1[SparkSessionExtension, Unit] used to configure Spark Session 
extensions. The " +
+  "classes must have a no-args constructor. If multiple extensions are 
specified, they are " +
+  "applied in the specified order. For the case of rules and planner 
strategies, they are " +
+  "applied in the specified order. For the case of parsers, the last 
parser is used and each " +
+  "parser can delegate to its predecessor. For the case of function name 
conflicts, the last " +
+  "registered function name is used.")
 .stringConf
+.toSequence
 .createOptional
 
   val QUERY_EXECUTION_LISTENERS = 
buildStaticConf("spark.sql.queryExecutionListeners")
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
index 26272c3..1c13a68 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/SparkSession.scala
@@ -93,7 +93,7 @@ class SparkSession private(
   private[sql] def this(sc: SparkContext) {
 this(sc, None, None,

[spark] branch master updated (73c7b12 -> b316ebf)

2019-01-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git.


from 73c7b12  [SPARK-26546][SQL] Caching of 
java.time.format.DateTimeFormatter
 add b316ebf  [SPARK-26491][K8S][FOLLOWUP] Fix compile failure

No new revisions were added by this update.

Summary of changes:
 .../spark/deploy/k8s/integrationtest/KubernetesTestComponents.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r31858 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_22_19-73c7b12-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-09 Thread pwendell

Author: pwendell
Date: Thu Jan 10 06:31:41 2019
New Revision: 31858

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_09_22_19-73c7b12 docs


[This commit notification would consist of 1775 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-25484][SQL][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark

2019-01-09 Thread dongjoon

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new 49c062b  [SPARK-25484][SQL][TEST] Refactor 
ExternalAppendOnlyUnsafeRowArrayBenchmark
49c062b is described below

commit 49c062b2e0487b13b732b18edde105e1f000c20d
Author: Peter Toth 
AuthorDate: Wed Jan 9 09:54:21 2019 -0800

[SPARK-25484][SQL][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark

## What changes were proposed in this pull request?

Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark to use main method.

## How was this patch tested?

Manually tested and regenerated results.
Please note that `spark.memory.debugFill` setting has a huge impact on this 
benchmark. Since it is set to true by default when running the benchmark from 
SBT, we need to disable it:
```
SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt ";project sql;set javaOptions in 
Test += \"-Dspark.memory.debugFill=false\";test:runMain 
org.apache.spark.sql.execution.ExternalAppendOnlyUnsafeRowArrayBenchmark"
```

Closes #22617 from peter-toth/SPARK-25484.

Lead-authored-by: Peter Toth 
Co-authored-by: Peter Toth 
Co-authored-by: Dongjoon Hyun 
Signed-off-by: Dongjoon Hyun 
---
 ...alAppendOnlyUnsafeRowArrayBenchmark-results.txt |  45 ++
 ...ExternalAppendOnlyUnsafeRowArrayBenchmark.scala | 158 -
 2 files changed, 105 insertions(+), 98 deletions(-)

diff --git 
a/sql/core/benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt 
b/sql/core/benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt
new file mode 100644
index 000..02c6b72
--- /dev/null
+++ b/sql/core/benchmarks/ExternalAppendOnlyUnsafeRowArrayBenchmark-results.txt
@@ -0,0 +1,45 @@
+
+WITHOUT SPILL
+
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Array with 10 rows:  Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+ArrayBuffer   6378 / 6550 16.1 
 62.3   1.0X
+ExternalAppendOnlyUnsafeRowArray  6196 / 6242 16.5 
 60.5   1.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Array with 1000 rows:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+ArrayBuffer 11988 / 12027 21.9 
 45.7   1.0X
+ExternalAppendOnlyUnsafeRowArray37480 / 37574  7.0 
143.0   0.3X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Array with 3 rows:   Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+ArrayBuffer 23536 / 23538 20.9 
 47.9   1.0X
+ExternalAppendOnlyUnsafeRowArray31275 / 31277 15.7 
 63.6   0.8X
+
+
+
+WITH SPILL
+
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Spilling with 1000 rows: Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+UnsafeExternalSorter29241 / 29279  9.0 
111.5   1.0X
+ExternalAppendOnlyUnsafeRowArray14309 / 14313 18.3 
 54.6   2.0X
+
+OpenJDK 64-Bit Server VM 1.8.0_191-b12 on Linux 3.10.0-862.3.2.el7.x86_64
+Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
+Spilling with 1 rows:Best/Avg Time(ms)Rate(M/s)   Per 
Row(ns)   Relative
+
+UnsafeExternalSorter11 /   11 14.8 
 67.4   1.0X
+ExternalAppendOnlyUnsafeRowArray 9 /9 17.6 
 56.8   1.2X
+
+
diff --git

[spark] branch master updated: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

2019-01-09 Thread lixiao

This is an automated email from the ASF dual-hosted git repository.

lixiao pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
 new e853afb  [SPARK-26448][SQL] retain the difference between 0.0 and -0.0
e853afb is described below

commit e853afb416cdbfa3078c00295b0b3da4bcaed62e
Author: Wenchen Fan 
AuthorDate: Wed Jan 9 13:50:32 2019 -0800

[SPARK-26448][SQL] retain the difference between 0.0 and -0.0

## What changes were proposed in this pull request?

In https://github.com/apache/spark/pull/23043 , we introduced a behavior 
change: Spark users are not able to distinguish 0.0 and -0.0 anymore.

This PR proposes an alternative fix to the original bug, to retain the 
difference between 0.0 and -0.0 inside Spark.

The idea is, we can rewrite the window partition key, join key and grouping 
key during logical phase, to normalize the special floating numbers. Thus only 
operators care about special floating numbers need to pay the perf overhead, 
and end users can distinguish -0.0.

## How was this patch tested?

existing test

Closes #23388 from cloud-fan/minor.

Authored-by: Wenchen Fan 
Signed-off-by: gatorsmile 
---
 docs/sql-migration-guide-upgrade.md|   2 +-
 .../catalyst/expressions/codegen/UnsafeWriter.java |  35 
 .../optimizer/NormalizeFloatingNumbers.scala   | 198 +
 .../spark/sql/catalyst/optimizer/Optimizer.scala   |   7 +-
 .../spark/sql/catalyst/planning/patterns.scala |   5 +-
 .../expressions/UnsafeRowConverterSuite.scala  |  16 --
 .../expressions/codegen/UnsafeRowWriterSuite.scala |  21 ---
 .../spark/sql/execution/aggregate/AggUtils.scala   |  16 +-
 .../apache/spark/sql/DataFrameAggregateSuite.scala |  59 --
 .../org/apache/spark/sql/DataFrameJoinSuite.scala  |  12 --
 .../spark/sql/DataFrameWindowFunctionsSuite.scala  |  60 +--
 .../apache/spark/sql/DatasetPrimitiveSuite.scala   |  45 +++--
 .../scala/org/apache/spark/sql/JoinSuite.scala |  60 +++
 .../scala/org/apache/spark/sql/QueryTest.scala |  51 +++---
 .../sql/hive/execution/AggregationQuerySuite.scala |   2 +-
 15 files changed, 436 insertions(+), 153 deletions(-)

diff --git a/docs/sql-migration-guide-upgrade.md 
b/docs/sql-migration-guide-upgrade.md
index 0fcdd42..4e36fd4 100644
--- a/docs/sql-migration-guide-upgrade.md
+++ b/docs/sql-migration-guide-upgrade.md
@@ -25,7 +25,7 @@ displayTitle: Spark SQL Upgrading Guide
 
   - In Spark version 2.4 and earlier, `Dataset.groupByKey` results to a 
grouped dataset with key attribute wrongly named as "value", if the key is 
non-struct type, e.g. int, string, array, etc. This is counterintuitive and 
makes the schema of aggregation queries weird. For example, the schema of 
`ds.groupByKey(...).count()` is `(value, count)`. Since Spark 3.0, we name the 
grouping attribute to "key". The old behaviour is preserved under a newly added 
configuration `spark.sql.legacy.data [...]
 
-  - In Spark version 2.4 and earlier, float/double -0.0 is semantically equal 
to 0.0, but users can still distinguish them via `Dataset.show`, 
`Dataset.collect` etc. Since Spark 3.0, float/double -0.0 is replaced by 0.0 
internally, and users can't distinguish them any more.
+  - In Spark version 2.4 and earlier, float/double -0.0 is semantically equal 
to 0.0, but -0.0 and 0.0 are considered as different values when used in 
aggregate grouping keys, window partition keys and join keys. Since Spark 3.0, 
this bug is fixed. For example, `Seq(-0.0, 0.0).toDF("d").groupBy("d").count()` 
returns `[(0.0, 2)]` in Spark 3.0, and `[(0.0, 1), (-0.0, 1)]` in Spark 2.4 and 
earlier.
 
   - In Spark version 2.4 and earlier, users can create a map with duplicated 
keys via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior 
of map with duplicated keys is undefined, e.g. map look up respects the 
duplicated key appears first, `Dataset.collect` only keeps the duplicated key 
appears last, `MapKeys` returns duplicated keys, etc. Since Spark 3.0, these 
built-in functions will remove duplicated map keys with last wins policy. Users 
may still read map values wit [...]
 
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
index 7553ab8..95263a0 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeWriter.java
@@ -198,46 +198,11 @@ public abstract class UnsafeWriter {
 Platform.putLong(getBuffer(), offset, value);
   }
 
-  // We need to take care of NaN and -0.0 in several places:
-  //   1. When compare values, different NaNs should be treated as same,

svn commit: r31853 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_13_59-e853afb-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2019-01-09 Thread pwendell

Author: pwendell
Date: Wed Jan  9 22:11:35 2019
New Revision: 31853

Log:
Apache Spark 3.0.0-SNAPSHOT-2019_01_09_13_59-e853afb docs


[This commit notification would consist of 1775 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch master updated: [SPARK-26546][SQL] Caching of java.time.format.DateTimeFormatter

[spark] branch master updated: [SPARK-26065][FOLLOW-UP][SQL] Fix the Failure when having two Consecutive Hints

svn commit: r31856 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_18_10-2d01bcc-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[spark] branch master updated: [SPARK-26493][SQL] Allow multiple spark.sql.extensions

[spark] branch master updated (73c7b12 -> b316ebf)

svn commit: r31858 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_22_19-73c7b12-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

[spark] branch master updated: [SPARK-25484][SQL][TEST] Refactor ExternalAppendOnlyUnsafeRowArrayBenchmark

[spark] branch master updated: [SPARK-26448][SQL] retain the difference between 0.0 and -0.0

svn commit: r31853 - in /dev/spark/3.0.0-SNAPSHOT-2019_01_09_13_59-e853afb-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

9 matches

Site Navigation

Mail list logo

Footer information