[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

viirya Thu, 02 Aug 2018 16:43:43 -0700

GitHub user viirya opened a pull request:

    https://github.com/apache/spark/pull/21980


    [SPARK-25010][SQL] Rand/Randn should produce different values for each 
execution in streaming query

    ## What changes were proposed in this pull request?
    
    Like Uuid in SPARK-24896, Rand and Randn expressions now produce the same 
results for each execution in streaming query. It doesn't make too much sense 
for streaming queries. We should make them produce different results as Uuid.
    
    In this change, similar to Uuid, we assign new random seeds to Rand/Randn 
when returning optimized plan from `IncrementalExecution`.
    
    Note: Different to Uuid, Rand/Randn can be created with initial seed. 
Because we replace this initial seed at `IncrementalExecution`, it doesn't use 
the initial seed anymore. For now it seems to me not a big issue for streaming 
query. But need to confirm with others. cc @zsxwing @cloud-fan 
    
    ## How was this patch tested?
    
    Added test.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/viirya/spark-1 SPARK-25010

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21980.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21980
    
----
commit 1e0370ec1c5f3920a3ba59abb46446e255ecb55b
Author: Liang-Chi Hsieh <viirya@...>
Date:   2018-08-02T23:35:10Z

    Rand/Randn should produce different values for each execution in streaming 
query.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21980: [SPARK-25010][SQL] Rand/Randn should produce diff...

Reply via email to