Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14524
  
    Yes, isn't that why it's possible to fix a seed? I can understand an 
argument that the default should be not-random, but, every API I've ever seen 
(including Spark's) defaults to a random seed.
    
    The Spark tests fix a seed (mostly), for this reason I'd assume.
    
    The flip-side is the surprise when algorithms don't actually exhibit random 
behavior -- N runs of some inherently stochastic process might produce the 
exact same result.
    
    Is the argument that you might be consuming some third-party implementation 
that doesn't let you fix a seed and then you can't debug it reproducibly?
    
    I wouldn't push for this change hard, but would like to understand the 
logic better.
    
    How about the fixes to the Python API such that (explicitly) specifying 
seed=None would give a random seed? That seems like it doesn't work in some 
cases. I could make just that part of the change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to