Any update to above mail
and  Can anyone tell me logic - I have to filter tweets and submit tweets
 with particular  #hashtag1 to SparkSQL  databases and tweets with
 #hashtag2  will be passed to sentiment analysis phase ."Problem is how to
split the input data in two streams using hashtags "

On Fri, May 8, 2015 at 2:42 AM, anshu shukla <anshushuk...@gmail.com> wrote:

> One of  the best discussion in mailing list  :-)  ...Please  help me in
> concluding --
>
> The whole discussion concludes that -
>
> 1-  Framework  does not support  increasing parallelism of any task just
> by any inbuilt function .
> 2-  User have to manualy write logic for filter output of upstream node in
> DAG  to manage input to Downstream nodes (like shuffle grouping etc in
> STORM)
> 3- If we want to increase the level of parallelism of twitter streaming
>  Spout  to *get higher rate of  DStream of tweets  (to increase the rate
> of input )  , how it is possible ...  *
>
>   *val tweetStream = **TwitterUtils.createStream(ssc, Utils.getAuth)*
>
>
>
> On Fri, May 8, 2015 at 2:16 AM, Evo Eftimov <evo.efti...@isecc.com> wrote:
>
>> 1. Will rdd2.filter run before rdd1.filter finish?
>>
>>
>>
>> YES
>>
>>
>>
>> 2. We have to traverse rdd twice. Any comments?
>>
>>
>>
>> You can invoke filter or whatever other transformation / function many
>> times
>>
>> Ps: you  have to study / learn the Parallel Programming Model of an OO
>> Framework like Spark – in any OO Framework lots of Behavior is hidden /
>> encapsulated by the Framework and the client code gets invoked at specific
>> points in the Flow of Control / Data based on callback functions
>>
>>
>>
>> That’s why stuff like RDD.filter(), RDD.filter() may look “sequential” to
>> you but it is not
>>
>>
>>
>>
>>
>> *From:* Bill Q [mailto:bill.q....@gmail.com]
>> *Sent:* Thursday, May 7, 2015 6:27 PM
>>
>> *To:* Evo Eftimov
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Map one RDD into two RDD
>>
>>
>>
>> The multi-threading code in Scala is quite simple and you can google it
>> pretty easily. We used the Future framework. You can use Akka also.
>>
>>
>>
>> @Evo My concerns for filtering solution are: 1. Will rdd2.filter run
>> before rdd1.filter finish? 2. We have to traverse rdd twice. Any comments?
>>
>>
>>
>> On Thursday, May 7, 2015, Evo Eftimov <evo.efti...@isecc.com> wrote:
>>
>> Scala is a language, Spark is an OO/Functional, Distributed Framework
>> facilitating Parallel Programming in a distributed environment
>>
>>
>>
>> Any “Scala parallelism” occurs within the Parallel Model imposed by the
>> Spark OO Framework – ie it is limited in terms of what it can achieve in
>> terms of influencing the Spark Framework behavior – that is the nature of
>> programming with/for frameworks
>>
>>
>>
>> When RDD1 and RDD2 are partitioned and different Actions applied to them
>> this will result in Parallel Pipelines / DAGs within the Spark Framework
>>
>> RDD1 = RDD.filter()
>>
>> RDD2 = RDD.filter()
>>
>>
>>
>>
>>
>> *From:* Bill Q [mailto:bill.q....@gmail.com]
>> *Sent:* Thursday, May 7, 2015 4:55 PM
>> *To:* Evo Eftimov
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Map one RDD into two RDD
>>
>>
>>
>> Thanks for the replies. We decided to use concurrency in Scala to do the
>> two mappings using the same source RDD in parallel. So far, it seems to be
>> working. Any comments?
>>
>> On Wednesday, May 6, 2015, Evo Eftimov <evo.efti...@isecc.com> wrote:
>>
>> RDD1 = RDD.filter()
>>
>> RDD2 = RDD.filter()
>>
>>
>>
>> *From:* Bill Q [mailto:bill.q....@gmail.com <bill.q....@gmail.com>]
>> *Sent:* Tuesday, May 5, 2015 10:42 PM
>> *To:* user@spark.apache.org
>> *Subject:* Map one RDD into two RDD
>>
>>
>>
>> Hi all,
>>
>> I have a large RDD that I map a function to it. Based on the nature of
>> each record in the input RDD, I will generate two types of data. I would
>> like to save each type into its own RDD. But I can't seem to find an
>> efficient way to do it. Any suggestions?
>>
>>
>>
>> Many thanks.
>>
>>
>>
>>
>>
>> Bill
>>
>>
>>
>> --
>>
>> Many thanks.
>>
>> Bill
>>
>>
>>
>>
>>
>> --
>>
>> Many thanks.
>>
>> Bill
>>
>>
>>
>>
>>
>> --
>>
>> Many thanks.
>>
>> Bill
>>
>>
>>
>
>
>
> --
> Thanks & Regards,
> Anshu Shukla
>



-- 
Thanks & Regards,
Anshu Shukla

Reply via email to