Re: Map one RDD into two RDD

2015-05-08 Thread ayan guha
earn the Parallel Programming Model of an OO >>> Framework like Spark – in any OO Framework lots of Behavior is hidden / >>> encapsulated by the Framework and the client code gets invoked at specific >>> points in the Flow of Control / Data based on callback functions

Re: Map one RDD into two RDD

2015-05-08 Thread anshu shukla
t;> >> That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to >> you but it is not >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q@gmail.com] >> *Sent:* Thursday, May 7, 2015 6:27 PM >> >> *To:* Evo Eftimov >

Re: Map one RDD into two RDD

2015-05-07 Thread anshu shukla
in the Flow of Control / Data based on callback functions > > > > That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to > you but it is not > > > > > > *From:* Bill Q [mailto:bill.q@gmail.com] > *Sent:* Thursday, May 7, 2015 6:27 PM > > *T

RE: Map one RDD into two RDD

2015-05-07 Thread Evo Eftimov
: Bill Q [mailto:bill.q@gmail.com] Sent: Thursday, May 7, 2015 6:27 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: Map one RDD into two RDD The multi-threading code in Scala is quite simple and you can google it pretty easily. We used the Future framework. You can use Akka also

Re: Map one RDD into two RDD

2015-05-07 Thread Gerard Maas
n Parallel Pipelines / DAGs within the Spark Framework >> >> RDD1 = RDD.filter() >> >> RDD2 = RDD.filter() >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q@gmail.com] >> *Sent:* Thursday, May 7, 2015 4:55 PM >> *To:* Evo Eftimov &

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
ipelines / DAGs within the Spark Framework > > RDD1 = RDD.filter() > > RDD2 = RDD.filter() > > > > > > *From:* Bill Q [mailto:bill.q@gmail.com > ] > *Sent:* Thursday, May 7, 2015 4:55 PM > *To:* Evo Eftimov > *Cc:* user@spark.apache.org > > *Subject:*

RE: Map one RDD into two RDD

2015-05-07 Thread Evo Eftimov
: Bill Q [mailto:bill.q@gmail.com] Sent: Thursday, May 7, 2015 4:55 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: Map one RDD into two RDD Thanks for the replies. We decided to use concurrency in Scala to do the two mappings using the same source RDD in parallel. So far, it

Re: Map one RDD into two RDD

2015-05-07 Thread Gerard Maas
king. Any comments? > > > On Wednesday, May 6, 2015, Evo Eftimov wrote: > >> RDD1 = RDD.filter() >> >> RDD2 = RDD.filter() >> >> >> >> *From:* Bill Q [mailto:bill.q@gmail.com] >> *Sent:* Tuesday, May 5, 2015 10:42 PM >> *To:* user

Re: Map one RDD into two RDD

2015-05-07 Thread Bill Q
; > *From:* Bill Q [mailto:bill.q@gmail.com > ] > *Sent:* Tuesday, May 5, 2015 10:42 PM > *To:* user@spark.apache.org > > *Subject:* Map one RDD into two RDD > > > > Hi all, > > I have a large RDD that I map a function to it. Based on the nature of > each record in

RE: Map one RDD into two RDD

2015-05-06 Thread Evo Eftimov
RDD1 = RDD.filter() RDD2 = RDD.filter() From: Bill Q [mailto:bill.q@gmail.com] Sent: Tuesday, May 5, 2015 10:42 PM To: user@spark.apache.org Subject: Map one RDD into two RDD Hi all, I have a large RDD that I map a function to it. Based on the nature of each record in the input RDD

Re: Map one RDD into two RDD

2015-05-05 Thread Ted Yu
Have you looked at RDD#randomSplit() (as example) ? Cheers On Tue, May 5, 2015 at 2:42 PM, Bill Q wrote: > Hi all, > I have a large RDD that I map a function to it. Based on the nature of > each record in the input RDD, I will generate two types of data. I would > like to save each type into it

Map one RDD into two RDD

2015-05-05 Thread Bill Q
Hi all, I have a large RDD that I map a function to it. Based on the nature of each record in the input RDD, I will generate two types of data. I would like to save each type into its own RDD. But I can't seem to find an efficient way to do it. Any suggestions? Many thanks. Bill -- Many thank