What you have is sequential and hence sequential processing.
Also Spark/Scala are not parallel programming languages.
But even if they were, statements are executed sequentially unless you exploit 
the parallel/concurrent execution features.

Anyway, see if this works:

val (RDD1, RDD2) = (JavaFunctions.cassandraTable(...), 

val (RDD3, RDD4) = (RDD1.flatMap(..), RDD2.flatMap(..))

I am hoping that Spark being based on Scala, the behavior below will apply:
scala> var x = 0
x: Int = 0

scala> val (a,b) = (x + 1, x+1)
a: Int = 1
b: Int = 1

From: Cassa L <lcas...@gmail.com>
Date: Friday, October 27, 2017 at 1:50 AM
To: Jörn Franke <jornfra...@gmail.com>
Cc: user <u...@spark.apache.org>, <user@cassandra.apache.org>
Subject: Re: Why don't I see my spark jobs running in parallel in 
Cassandra/Spark DSE cluster?

No, I dont use Yarn.  This is standalone spark that comes with DataStax 
Enterprise version of Cassandra.

On Thu, Oct 26, 2017 at 11:22 PM, Jörn Franke 
<jornfra...@gmail.com<mailto:jornfra...@gmail.com>> wrote:
Do you use yarn ? Then you need to configure the queues with the right 
scheduler and method.

On 27. Oct 2017, at 08:05, Cassa L 
<lcas...@gmail.com<mailto:lcas...@gmail.com>> wrote:
I have a spark job that has use case as below:
RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some 
transformation and after that I do a count on transformed data.

Code somewhat  looks like this:

RDD3 = RDD1.flatMap(..)
RDD4 = RDD2.flatMap()


In Spark UI I see count() functions are getting called one after another. How 
do I make it parallel? I also looked at below discussion from Cloudera, but it 
does not show how to run driver functions in parallel. Do I just add Executor 
and run them in threads?


<Screen Shot 2017-10-26 at 10.54.51 PM.png>Attaching UI snapshot here?


Reply via email to