GitHub user gatorsmile opened a pull request:
https://github.com/apache/spark/pull/11232
[SPARK-13333] [SQL] Added Rand and Randn Functions Generating Deterministic
Results
So far, `rand` and `randn` functions with a `seed` argument are commonly
used.
Based on the common sense, the results of `rand` and `randn` should be
deterministic if the `seed` parameter value is provided. However, the current
solution is unable to generate deterministic results. It depends on data
partitioning and task scheduling.
An example has been given by @jkbradley in the following JIRA:
https://issues.apache.org/jira/browse/SPARK-13333
This PR is to introduce a new parameter `deterministic` for `Rand` and
`Randn` functions. When users set it true, the results will be deterministic.
**Question:** should we introduce new parameter `deterministic` for `Rand`
and `Randn` functions? Or just make the results deterministic when users input
the parameter value of `seed`?
@rxin @marmbrus @cloud-fan @jkbradley
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gatorsmile/spark randSeed
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11232.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11232
----
commit 3f90749148113069c312d5a03b09a67b054e5620
Author: gatorsmile <[email protected]>
Date: 2016-02-17T04:28:05Z
added a random function that can generate deterministic results
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]