GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/21980
[SPARK-25010][SQL] Rand/Randn should produce different values for each
execution in streaming query
## What changes were proposed in this pull request?
Like Uuid in SPARK-24896, Rand and Randn expressions now produce the same
results for each execution in streaming query. It doesn't make too much sense
for streaming queries. We should make them produce different results as Uuid.
In this change, similar to Uuid, we assign new random seeds to Rand/Randn
when returning optimized plan from `IncrementalExecution`.
Note: Different to Uuid, Rand/Randn can be created with initial seed.
Because we replace this initial seed at `IncrementalExecution`, it doesn't use
the initial seed anymore. For now it seems to me not a big issue for streaming
query. But need to confirm with others. cc @zsxwing @cloud-fan
## How was this patch tested?
Added test.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-25010
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21980.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21980
----
commit 1e0370ec1c5f3920a3ba59abb46446e255ecb55b
Author: Liang-Chi Hsieh <viirya@...>
Date: 2018-08-02T23:35:10Z
Rand/Randn should produce different values for each execution in streaming
query.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]