GitHub user smartnut007 opened a pull request:

    https://github.com/apache/spark/pull/477

    SPARK-1438 RDD.sample() make seed param optional

    copying form previous pull request https://github.com/apache/spark/pull/462
    
    Its probably better to let the underlying language implementation take care 
of the default . This was easier to do with python as the default value for 
seed in random and numpy random is None.
    
    In Scala/Java side it might mean propagating an Option or null(oh no!) down 
the chain until where the Random is constructed. But, looks like the convention 
in some other methods was to use System.nanoTime. So, followed that convention.
    
    Conflict with overloaded method in sql.SchemaRDD.sample which also defines 
default params.
    sample(fraction, withReplacement=false, seed=math.random)
    Scala does not allow more than one overloaded to have default params. I 
believe the author intended to override the RDD.sample method and not overload 
it. So, changed it.
    
    If backward compatible is important, 3 new method can be introduced 
(without default params) like this
    sample(fraction)
    sample(fraction, withReplacement)
    sample(fraction, withReplacement, seed)
    
    Added some tests for the scala RDD takeSample method. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/smartnut007/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/477.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #477
    
----
commit 0c247dba6084313873b539bcf230371c903f04b3
Author: Arun Ramakrishnan <[email protected]>
Date:   2014-04-21T07:41:09Z

    SPARK-1438 RDD language apis to support optional seed in RDD methods 
sample/takeSample

commit 69619c6686cc7ff7113f8ef031f3ed3698bafa25
Author: Arun Ramakrishnan <[email protected]>
Date:   2014-04-22T04:37:22Z

    SPARK-1438 fix spacing issue

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to