Any update to above mail and Can anyone tell me logic - I have to filter tweets and submit tweets with particular #hashtag1 to SparkSQL databases and tweets with #hashtag2 will be passed to sentiment analysis phase ."Problem is how to split the input data in two streams using hashtags "
On Fri, May 8, 2015 at 2:42 AM, anshu shukla <anshushuk...@gmail.com> wrote: > One of the best discussion in mailing list :-) ...Please help me in > concluding -- > > The whole discussion concludes that - > > 1- Framework does not support increasing parallelism of any task just > by any inbuilt function . > 2- User have to manualy write logic for filter output of upstream node in > DAG to manage input to Downstream nodes (like shuffle grouping etc in > STORM) > 3- If we want to increase the level of parallelism of twitter streaming > Spout to *get higher rate of DStream of tweets (to increase the rate > of input ) , how it is possible ... * > > *val tweetStream = **TwitterUtils.createStream(ssc, Utils.getAuth)* > > > > On Fri, May 8, 2015 at 2:16 AM, Evo Eftimov <evo.efti...@isecc.com> wrote: > >> 1. Will rdd2.filter run before rdd1.filter finish? >> >> >> >> YES >> >> >> >> 2. We have to traverse rdd twice. Any comments? >> >> >> >> You can invoke filter or whatever other transformation / function many >> times >> >> Ps: you have to study / learn the Parallel Programming Model of an OO >> Framework like Spark – in any OO Framework lots of Behavior is hidden / >> encapsulated by the Framework and the client code gets invoked at specific >> points in the Flow of Control / Data based on callback functions >> >> >> >> That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to >> you but it is not >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q....@gmail.com] >> *Sent:* Thursday, May 7, 2015 6:27 PM >> >> *To:* Evo Eftimov >> *Cc:* user@spark.apache.org >> *Subject:* Re: Map one RDD into two RDD >> >> >> >> The multi-threading code in Scala is quite simple and you can google it >> pretty easily. We used the Future framework. You can use Akka also. >> >> >> >> @Evo My concerns for filtering solution are: 1. Will rdd2.filter run >> before rdd1.filter finish? 2. We have to traverse rdd twice. Any comments? >> >> >> >> On Thursday, May 7, 2015, Evo Eftimov <evo.efti...@isecc.com> wrote: >> >> Scala is a language, Spark is an OO/Functional, Distributed Framework >> facilitating Parallel Programming in a distributed environment >> >> >> >> Any “Scala parallelism” occurs within the Parallel Model imposed by the >> Spark OO Framework – ie it is limited in terms of what it can achieve in >> terms of influencing the Spark Framework behavior – that is the nature of >> programming with/for frameworks >> >> >> >> When RDD1 and RDD2 are partitioned and different Actions applied to them >> this will result in Parallel Pipelines / DAGs within the Spark Framework >> >> RDD1 = RDD.filter() >> >> RDD2 = RDD.filter() >> >> >> >> >> >> *From:* Bill Q [mailto:bill.q....@gmail.com] >> *Sent:* Thursday, May 7, 2015 4:55 PM >> *To:* Evo Eftimov >> *Cc:* user@spark.apache.org >> *Subject:* Re: Map one RDD into two RDD >> >> >> >> Thanks for the replies. We decided to use concurrency in Scala to do the >> two mappings using the same source RDD in parallel. So far, it seems to be >> working. Any comments? >> >> On Wednesday, May 6, 2015, Evo Eftimov <evo.efti...@isecc.com> wrote: >> >> RDD1 = RDD.filter() >> >> RDD2 = RDD.filter() >> >> >> >> *From:* Bill Q [mailto:bill.q....@gmail.com <bill.q....@gmail.com>] >> *Sent:* Tuesday, May 5, 2015 10:42 PM >> *To:* user@spark.apache.org >> *Subject:* Map one RDD into two RDD >> >> >> >> Hi all, >> >> I have a large RDD that I map a function to it. Based on the nature of >> each record in the input RDD, I will generate two types of data. I would >> like to save each type into its own RDD. But I can't seem to find an >> efficient way to do it. Any suggestions? >> >> >> >> Many thanks. >> >> >> >> >> >> Bill >> >> >> >> -- >> >> Many thanks. >> >> Bill >> >> >> >> >> >> -- >> >> Many thanks. >> >> Bill >> >> >> >> >> >> -- >> >> Many thanks. >> >> Bill >> >> >> > > > > -- > Thanks & Regards, > Anshu Shukla > -- Thanks & Regards, Anshu Shukla