GitHub user smartnut007 opened a pull request:

    https://github.com/apache/spark/pull/462

    SPARK-1438 RDD make seed optional in RDD methods sam...

    Its probably better to let the underlying language implementation take care 
of the default seed if none is specified by the user. This was easier to do 
with python as the default value for seed in random and numpy random is None.
    
    In Scala/Java side it might meen propagating an Option or null(oh no!) down 
the chain until where the Random is constructed. But, looks like the convention 
in some other methods was to use System.nanoTime. So, followed that convention. 
    
    Conflict with overloaded method in sql.SchemaRDD
    SchemaRDD defines an overloaded method 
    sample(fraction, withReplacement=false, seed=math.random)
    
    So, SchemaRDD had tow sample methods with same parameters in different 
order. I believe the author intended to override the RDD.sample method and not 
overload it. So, changed it.
    
    Also, scala does not allow more than overloaded method to have default 
params. So, this code had to be modified. Not sure if there is exiting 
application code that might break because of this. If we need to keep things 
backward compatible, 3 new method can be introduced (without default params) 
like this
    sample(fraction)
    sample(fraction, withReplacement)
    sample(fraction, withReplacement, seed)
    
    Added some tests for the scala RDD takeSample method. Was able to test the 
java side manually. 


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/smartnut007/spark branch-1.0

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #462
    
----
commit cb240b3c52149b2afc1195752c3ec0438bb0cd10
Author: Arun Ramakrishnan <[email protected]>
Date:   2014-04-21T07:41:09Z

    SPARK-1438 RDD language apis to support optional seed in RDD methods 
sample/takeSample

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to