Sorry for late reply.

Here is what I was thinking....

import random as r
def main():
....get SparkContext
#Just for fun, lets assume seed is an id
    filename="bin.dat"
    seed = id(filename)
#broadcast it
br = sc.broadcast(seed)

#set up dummy list
    lst = []
    for i in range(4):
        x=[]
        for j in range(4):
            x.append(j)
        lst.append(x)
    print lst
    base = sc.parallelize(lst)
    print base.map(randomize).collect()

Randomize looks like
def randomize(lst):
    local_seed = br.value
    r.seed(local_seed)
    r.shuffle(lst)
    return lst


Let me know if this helps...




base = sc.parallelize(lst)
    print base.map(randomize).collect()

On Wed, May 13, 2015 at 11:41 PM, Charles Hayden <[email protected]>
wrote:

>  ​Can you elaborate? Broadcast will distribute the seed, which is only
> one number.  But what construct do I use to "plant" the seed (call
> random.seed()) once on each worker?
>  ------------------------------
> *From:* ayan guha <[email protected]>
> *Sent:* Tuesday, May 12, 2015 11:17 PM
> *To:* Charles Hayden
> *Cc:* user
> *Subject:* Re: how to set random seed
>
>
> Easiest way is to broadcast it.
> On 13 May 2015 10:40, "Charles Hayden" <[email protected]> wrote:
>
>>  In pySpark, I am writing a map with a lambda that calls random.shuffle.
>> For testing, I want to be able to give it a seed, so that successive runs
>> will produce the same shuffle.
>>  I am looking for a way to set this same random seed once on each
>> worker.  Is there any simple way to do it?​
>>
>>
>>


-- 
Best Regards,
Ayan Guha

Reply via email to