Github user gatorsmile commented on the pull request:
https://github.com/apache/spark/pull/11232#issuecomment-186024225
When users provide a specific seed number, they always expect a
deterministic result. However, the current implementation returns a
non-deterministic result.
My concern is why should users call a rand/randn function with a specific
seed? What is the use cases for these APIs? For most of external users, they
are unable to control the actual seed values which are affected by data
partitioning and task scheduling.
How about removing these APIs `rand(seed)` and `randn(seed)`, if it makes
users confused?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]