[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...

gatorsmile Thu, 18 Feb 2016 18:59:39 -0800

Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/11232#issuecomment-186024225
  
    When users provide a specific seed number, they always expect a 
deterministic result. However, the current implementation returns a 
non-deterministic result. 
    
    My concern is why should users call a rand/randn function with a specific 
seed? What is the use cases for these APIs? For most of external users, they 
are unable to control the actual seed values which are affected by data 
partitioning and task scheduling.  
    
    How about removing these APIs `rand(seed)` and `randn(seed)`, if it makes 
users confused?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-13333] [SQL] Added Rand and Randn Funct...

Reply via email to