spark git commit: [SPARK-23361][YARN] Allow AM to restart after initial tokens expire.

2018-03-22 Thread jshao
Repository: spark
Updated Branches:
  refs/heads/master b2edc30db -> 5fa438471


[SPARK-23361][YARN] Allow AM to restart after initial tokens expire.

Currently, the Spark AM relies on the initial set of tokens created by
the submission client to be able to talk to HDFS and other services that
require delegation tokens. This means that after those tokens expire, a
new AM will fail to start (e.g. when there is an application failure and
re-attempts are enabled).

This PR makes it so that the first thing the AM does when the user provides
a principal and keytab is to create new delegation tokens for use. This
makes sure that the AM can be started irrespective of how old the original
token set is. It also allows all of the token management to be done by the
AM - there is no need for the submission client to set configuration values
to tell the AM when to renew tokens.

Note that even though in this case the AM will not be using the delegation
tokens created by the submission client, those tokens still need to be provided
to YARN, since they are used to do log aggregation.

To be able to re-use the code in the AMCredentialRenewal for the above
purposes, I refactored that class a bit so that it can fetch tokens into
a pre-defined UGI, insted of always logging in.

Another issue with re-attempts is that, after the fix that allows the AM
to restart correctly, new executors would get confused about when to
update credentials, because the credential updater used the update time
initially set up by the submission code. This could make the executor
fail to update credentials in time, since that value would be very out
of date in the situation described in the bug.

To fix that, I changed the YARN code to use the new RPC-based mechanism
for distributing tokens to executors. This allowed the old credential
updater code to be removed, and a lot of code in the renewer to be
simplified.

I also made two currently hardcoded values (the renewal time ratio, and
the retry wait) configurable; while this probably never needs to be set
by anyone in a production environment, it helps with testing; that's also
why they're not documented.

Tested on real cluster with a specially crafted application to test this
functionality: checked proper access to HDFS, Hive and HBase in cluster
mode with token renewal on and AM restarts. Tested things still work in
client mode too.

Author: Marcelo Vanzin 

Closes #20657 from vanzin/SPARK-23361.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5fa43847
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5fa43847
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5fa43847

Branch: refs/heads/master
Commit: 5fa438471110afbf4e2174df449ac79e292501f8
Parents: b2edc30
Author: Marcelo Vanzin 
Authored: Fri Mar 23 13:59:21 2018 +0800
Committer: jerryshao 
Committed: Fri Mar 23 13:59:21 2018 +0800

--
 .../main/scala/org/apache/spark/SparkConf.scala |  12 +-
 .../apache/spark/deploy/SparkHadoopUtil.scala   |  32 +-
 .../executor/CoarseGrainedExecutorBackend.scala |  12 -
 .../apache/spark/internal/config/package.scala  |  12 +
 .../MesosHadoopDelegationTokenManager.scala |  11 +-
 .../spark/deploy/yarn/ApplicationMaster.scala   | 117 +++-
 .../org/apache/spark/deploy/yarn/Client.scala   | 102 +++
 .../spark/deploy/yarn/YarnSparkHadoopUtil.scala |  20 --
 .../org/apache/spark/deploy/yarn/config.scala   |  25 --
 .../yarn/security/AMCredentialRenewer.scala | 291 ---
 .../yarn/security/CredentialUpdater.scala   | 131 -
 .../YARNHadoopDelegationTokenManager.scala  |   9 +-
 .../cluster/YarnClientSchedulerBackend.scala|   9 +-
 .../cluster/YarnSchedulerBackend.scala  |  10 +-
 .../YARNHadoopDelegationTokenManagerSuite.scala |   7 +-
 .../org/apache/spark/streaming/Checkpoint.scala |   3 -
 16 files changed, 238 insertions(+), 565 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5fa43847/core/src/main/scala/org/apache/spark/SparkConf.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkConf.scala 
b/core/src/main/scala/org/apache/spark/SparkConf.scala
index f53b2be..129956e 100644
--- a/core/src/main/scala/org/apache/spark/SparkConf.scala
+++ b/core/src/main/scala/org/apache/spark/SparkConf.scala
@@ -603,13 +603,15 @@ private[spark] object SparkConf extends Logging {
 "Please use spark.kryoserializer.buffer instead. The default value for 
" +
   "spark.kryoserializer.buffer.mb was previously specified as '0.064'. 
Fractional values " +
   "are no longer accepted. To specify the equivalent now, one may use 
'64k'."),
-  DeprecatedConfig("spark.rpc", 

svn commit: r25896 - in /dev/spark/2.3.1-SNAPSHOT-2018_03_22_22_01-1d0d0a5-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-03-22 Thread pwendell
Author: pwendell
Date: Fri Mar 23 05:19:01 2018
New Revision: 25896

Log:
Apache Spark 2.3.1-SNAPSHOT-2018_03_22_22_01-1d0d0a5 docs


[This commit notification would consist of 1443 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used

2018-03-22 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/branch-2.3 4da8c22f7 -> 1d0d0a5fc


[SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used

## What changes were proposed in this pull request?

We should provide customized canonicalize plan for `InMemoryRelation` and 
`InMemoryTableScanExec`. Otherwise, we can wrongly treat two different cached 
plans as same result. It causes wrongly reused exchange then.

For a test query like this:
```scala
val cached = spark.createDataset(Seq(TestDataUnion(1, 2, 3), TestDataUnion(4, 
5, 6))).cache()
val group1 = cached.groupBy("x").agg(min(col("y")) as "value")
val group2 = cached.groupBy("x").agg(min(col("z")) as "value")
group1.union(group2)
```

Canonicalized plans before:

First exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(1) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [x#4253, y#4254, z#4255], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- LocalTableScan [x#4253, y#4254, z#4255]
```

Second exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(3) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [x#4253, y#4254, z#4255], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- LocalTableScan [x#4253, y#4254, z#4255]
```

You can find that they have the canonicalized plans are the same, although we 
use different columns in two `InMemoryTableScan`s.

Canonicalized plan after:

First exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(1) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [none#0, none#1, none#2], true, 1, 
StorageLevel(memory, 1 replicas)
   +- LocalTableScan [none#0, none#1, none#2]
```

Second exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(3) InMemoryTableScan [none#0, none#2]
 +- InMemoryRelation [none#0, none#1, none#2], true, 1, 
StorageLevel(memory, 1 replicas)
   +- LocalTableScan [none#0, none#1, none#2]
```

## How was this patch tested?

Added unit test.

Author: Liang-Chi Hsieh 

Closes #20831 from viirya/SPARK-23614.

(cherry picked from commit b2edc30db1dcc6102687d20c158a2700965fdf51)
Signed-off-by: Wenchen Fan 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1d0d0a5f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1d0d0a5f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1d0d0a5f

Branch: refs/heads/branch-2.3
Commit: 1d0d0a5fc7ee009443797feb48823eb215d1940a
Parents: 4da8c22
Author: Liang-Chi Hsieh 
Authored: Thu Mar 22 21:23:25 2018 -0700
Committer: Wenchen Fan 
Committed: Thu Mar 22 21:23:34 2018 -0700

--
 .../execution/columnar/InMemoryRelation.scala| 10 ++
 .../columnar/InMemoryTableScanExec.scala | 19 +--
 .../org/apache/spark/sql/DatasetSuite.scala  |  9 +
 .../spark/sql/execution/ExchangeSuite.scala  |  7 +++
 4 files changed, 39 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1d0d0a5f/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
index 22e1691..2579046 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
@@ -24,6 +24,7 @@ import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
 import org.apache.spark.sql.catalyst.plans.logical
 import org.apache.spark.sql.catalyst.plans.logical.{HintInfo, Statistics}
 import org.apache.spark.sql.execution.SparkPlan
@@ -68,6 +69,15 @@ case class InMemoryRelation(
 
   override protected def innerChildren: Seq[SparkPlan] = Seq(child)
 
+  override def doCanonicalize(): logical.LogicalPlan =
+copy(output = output.map(QueryPlan.normalizeExprId(_, child.output)),
+  storageLevel = 

spark git commit: [SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used

2018-03-22 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master a649fcf32 -> b2edc30db


[SPARK-23614][SQL] Fix incorrect reuse exchange when caching is used

## What changes were proposed in this pull request?

We should provide customized canonicalize plan for `InMemoryRelation` and 
`InMemoryTableScanExec`. Otherwise, we can wrongly treat two different cached 
plans as same result. It causes wrongly reused exchange then.

For a test query like this:
```scala
val cached = spark.createDataset(Seq(TestDataUnion(1, 2, 3), TestDataUnion(4, 
5, 6))).cache()
val group1 = cached.groupBy("x").agg(min(col("y")) as "value")
val group2 = cached.groupBy("x").agg(min(col("z")) as "value")
group1.union(group2)
```

Canonicalized plans before:

First exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(1) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [x#4253, y#4254, z#4255], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- LocalTableScan [x#4253, y#4254, z#4255]
```

Second exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(3) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [x#4253, y#4254, z#4255], true, 1, 
StorageLevel(disk, memory, deserialized, 1 replicas)
   +- LocalTableScan [x#4253, y#4254, z#4255]
```

You can find that they have the canonicalized plans are the same, although we 
use different columns in two `InMemoryTableScan`s.

Canonicalized plan after:

First exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(1) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(1) InMemoryTableScan [none#0, none#1]
 +- InMemoryRelation [none#0, none#1, none#2], true, 1, 
StorageLevel(memory, 1 replicas)
   +- LocalTableScan [none#0, none#1, none#2]
```

Second exchange:
```
Exchange hashpartitioning(none#0, 5)
+- *(3) HashAggregate(keys=[none#0], functions=[partial_min(none#1)], 
output=[none#0, none#4])
   +- *(3) InMemoryTableScan [none#0, none#2]
 +- InMemoryRelation [none#0, none#1, none#2], true, 1, 
StorageLevel(memory, 1 replicas)
   +- LocalTableScan [none#0, none#1, none#2]
```

## How was this patch tested?

Added unit test.

Author: Liang-Chi Hsieh 

Closes #20831 from viirya/SPARK-23614.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b2edc30d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b2edc30d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b2edc30d

Branch: refs/heads/master
Commit: b2edc30db1dcc6102687d20c158a2700965fdf51
Parents: a649fcf
Author: Liang-Chi Hsieh 
Authored: Thu Mar 22 21:23:25 2018 -0700
Committer: Wenchen Fan 
Committed: Thu Mar 22 21:23:25 2018 -0700

--
 .../execution/columnar/InMemoryRelation.scala| 10 ++
 .../columnar/InMemoryTableScanExec.scala | 19 +--
 .../org/apache/spark/sql/DatasetSuite.scala  |  9 +
 .../spark/sql/execution/ExchangeSuite.scala  |  7 +++
 4 files changed, 39 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b2edc30d/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
index 22e1691..2579046 100644
--- 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
+++ 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala
@@ -24,6 +24,7 @@ import org.apache.spark.rdd.RDD
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.analysis.MultiInstanceRelation
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.QueryPlan
 import org.apache.spark.sql.catalyst.plans.logical
 import org.apache.spark.sql.catalyst.plans.logical.{HintInfo, Statistics}
 import org.apache.spark.sql.execution.SparkPlan
@@ -68,6 +69,15 @@ case class InMemoryRelation(
 
   override protected def innerChildren: Seq[SparkPlan] = Seq(child)
 
+  override def doCanonicalize(): logical.LogicalPlan =
+copy(output = output.map(QueryPlan.normalizeExprId(_, child.output)),
+  storageLevel = StorageLevel.NONE,
+  child = child.canonicalized,
+  tableName = None)(
+  _cachedColumnBuffers,
+  

spark git commit: [MINOR][PYTHON] Remove unused codes in schema parsing logics of PySpark

2018-03-22 Thread wenchen
Repository: spark
Updated Branches:
  refs/heads/master 4d37008c7 -> a649fcf32


[MINOR][PYTHON] Remove unused codes in schema parsing logics of PySpark

## What changes were proposed in this pull request?

This PR proposes to remove out unused codes, `_ignore_brackets_split` and 
`_BRACKETS`.

`_ignore_brackets_split` was introduced in 
https://github.com/apache/spark/commit/d57daf1f7732a7ac54a91fe112deeda0a254f9ef 
to refactor and support `toDF("...")`; however, 
https://github.com/apache/spark/commit/ebc124d4c44d4c84f7868f390f778c0ff5cd66cb 
replaced the logics here. Seems `_ignore_brackets_split` is not referred 
anymore.

`_BRACKETS` was introduced in 
https://github.com/apache/spark/commit/880eabec37c69ce4e9594d7babfac291b0f93f50;
 however, all other usages were removed out in 
https://github.com/apache/spark/commit/648a8626b82d27d84db3e48bccfd73d020828586.

This is rather a followup for 
https://github.com/apache/spark/commit/ebc124d4c44d4c84f7868f390f778c0ff5cd66cb 
which I missed in that PR.

## How was this patch tested?

Manually tested. Existing tests should cover this. I also double checked by 
`grep` in the whole repo.

Author: hyukjinkwon 

Closes #20878 from HyukjinKwon/minor-remove-unused.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a649fcf3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a649fcf3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a649fcf3

Branch: refs/heads/master
Commit: a649fcf32a7e610da2a2b4e3d94f5d1372c825d6
Parents: 4d37008
Author: hyukjinkwon 
Authored: Thu Mar 22 21:20:41 2018 -0700
Committer: Wenchen Fan 
Committed: Thu Mar 22 21:20:41 2018 -0700

--
 python/pyspark/sql/types.py | 35 ---
 1 file changed, 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a649fcf3/python/pyspark/sql/types.py
--
diff --git a/python/pyspark/sql/types.py b/python/pyspark/sql/types.py
index 826aab9..5d5919e 100644
--- a/python/pyspark/sql/types.py
+++ b/python/pyspark/sql/types.py
@@ -752,41 +752,6 @@ _all_complex_types = dict((v.typeName(), v)
 _FIXED_DECIMAL = re.compile("decimal\\(\\s*(\\d+)\\s*,\\s*(\\d+)\\s*\\)")
 
 
-_BRACKETS = {'(': ')', '[': ']', '{': '}'}
-
-
-def _ignore_brackets_split(s, separator):
-"""
-Splits the given string by given separator, but ignore separators inside 
brackets pairs, e.g.
-given "a,b" and separator ",", it will return ["a", "b"], but given 
"a, d", it will return
-["a", "d"].
-"""
-parts = []
-buf = ""
-level = 0
-for c in s:
-if c in _BRACKETS.keys():
-level += 1
-buf += c
-elif c in _BRACKETS.values():
-if level == 0:
-raise ValueError("Brackets are not correctly paired: %s" % s)
-level -= 1
-buf += c
-elif c == separator and level > 0:
-buf += c
-elif c == separator:
-parts.append(buf)
-buf = ""
-else:
-buf += c
-
-if len(buf) == 0:
-raise ValueError("The %s cannot be the last char: %s" % (separator, s))
-parts.append(buf)
-return parts
-
-
 def _parse_datatype_string(s):
 """
 Parses the given data type string to a :class:`DataType`. The data type 
string format equals


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r25895 - in /dev/spark/2.4.0-SNAPSHOT-2018_03_22_12_01-4d37008-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-03-22 Thread pwendell
Author: pwendell
Date: Thu Mar 22 19:15:31 2018
New Revision: 25895

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_03_22_12_01-4d37008 docs


[This commit notification would consist of 1449 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expression

2018-03-22 Thread hvanhovell
Repository: spark
Updated Branches:
  refs/heads/master 5c9eaa6b5 -> 4d37008c7


[SPARK-23599][SQL] Use RandomUUIDGenerator in Uuid expression

## What changes were proposed in this pull request?

As stated in Jira, there are problems with current `Uuid` expression which uses 
`java.util.UUID.randomUUID` for UUID generation.

This patch uses the newly added `RandomUUIDGenerator` for UUID generation. So 
we can make `Uuid` deterministic between retries.

## How was this patch tested?

Added unit tests.

Author: Liang-Chi Hsieh 

Closes #20861 from viirya/SPARK-23599-2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4d37008c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4d37008c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4d37008c

Branch: refs/heads/master
Commit: 4d37008c78d7d6b8f8a649b375ecc090700eca4f
Parents: 5c9eaa6
Author: Liang-Chi Hsieh 
Authored: Thu Mar 22 19:57:32 2018 +0100
Committer: Herman van Hovell 
Committed: Thu Mar 22 19:57:32 2018 +0100

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 16 +
 .../spark/sql/catalyst/expressions/misc.scala   | 26 +--
 .../analysis/ResolvedUuidExpressionsSuite.scala | 73 
 .../expressions/ExpressionEvalHelper.scala  |  5 +-
 .../expressions/MiscExpressionsSuite.scala  | 19 -
 .../org/apache/spark/sql/DataFrameSuite.scala   |  6 ++
 6 files changed, 136 insertions(+), 9 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4d37008c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index 7848f88..e821e96 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -18,6 +18,7 @@
 package org.apache.spark.sql.catalyst.analysis
 
 import scala.collection.mutable.ArrayBuffer
+import scala.util.Random
 
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.catalyst._
@@ -177,6 +178,7 @@ class Analyzer(
   TimeWindowing ::
   ResolveInlineTables(conf) ::
   ResolveTimeZone(conf) ::
+  ResolvedUuidExpressions ::
   TypeCoercion.typeCoercionRules(conf) ++
   extendedResolutionRules : _*),
 Batch("Post-Hoc Resolution", Once, postHocResolutionRules: _*),
@@ -1995,6 +1997,20 @@ class Analyzer(
   }
 
   /**
+   * Set the seed for random number generation in Uuid expressions.
+   */
+  object ResolvedUuidExpressions extends Rule[LogicalPlan] {
+private lazy val random = new Random()
+
+override def apply(plan: LogicalPlan): LogicalPlan = plan.transformUp {
+  case p if p.resolved => p
+  case p => p transformExpressionsUp {
+case Uuid(None) => Uuid(Some(random.nextLong()))
+  }
+}
+  }
+
+  /**
* Correctly handle null primitive inputs for UDF by adding extra [[If]] 
expression to do the
* null check.  When user defines a UDF with primitive parameters, there is 
no way to tell if the
* primitive parameter is null or not, so here we assume the primitive input 
is null-propagatable

http://git-wip-us.apache.org/repos/asf/spark/blob/4d37008c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
index 38e4fe4..ec93620 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
@@ -21,6 +21,7 @@ import java.util.UUID
 
 import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.util.RandomUUIDGenerator
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -122,18 +123,33 @@ case class CurrentDatabase() extends LeafExpression with 
Unevaluable {
46707d92-02f4-4817-8116-a4c3b23e6266
   """)
 // scalastyle:on line.size.limit
-case class Uuid() extends LeafExpression {
+case class Uuid(randomSeed: Option[Long] = None) extends LeafExpression with 
Nondeterministic {
 
-  override lazy val deterministic: Boolean = false
+  def this() = this(None)
+
+  override lazy val resolved: Boolean = randomSeed.isDefined
 
   

svn commit: r25886 - in /dev/spark/2.4.0-SNAPSHOT-2018_03_22_00_01-5c9eaa6-docs: ./ _site/ _site/api/ _site/api/R/ _site/api/java/ _site/api/java/lib/ _site/api/java/org/ _site/api/java/org/apache/ _s

2018-03-22 Thread pwendell
Author: pwendell
Date: Thu Mar 22 07:16:36 2018
New Revision: 25886

Log:
Apache Spark 2.4.0-SNAPSHOT-2018_03_22_00_01-5c9eaa6 docs


[This commit notification would consist of 1449 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org