What you have is sequential and hence sequential processing.
Also Spark/Scala are not parallel programming languages.
But even if they were, statements are executed sequentially unless you exploit
the parallel/concurrent execution features.
Anyway, see if this works:
val (RDD1, RDD2) = (JavaFunctions.cassandraTable(...),
val (RDD3, RDD4) = (RDD1.flatMap(..), RDD2.flatMap(..))
I am hoping that Spark being based on Scala, the behavior below will apply:
scala> var x = 0
x: Int = 0
scala> val (a,b) = (x + 1, x+1)
a: Int = 1
b: Int = 1
From: Cassa L <lcas...@gmail.com>
Date: Friday, October 27, 2017 at 1:50 AM
To: Jörn Franke <jornfra...@gmail.com>
Cc: user <u...@spark.apache.org>, <firstname.lastname@example.org>
Subject: Re: Why don't I see my spark jobs running in parallel in
Cassandra/Spark DSE cluster?
No, I dont use Yarn. This is standalone spark that comes with DataStax
Enterprise version of Cassandra.
On Thu, Oct 26, 2017 at 11:22 PM, Jörn Franke
Do you use yarn ? Then you need to configure the queues with the right
scheduler and method.
On 27. Oct 2017, at 08:05, Cassa L
I have a spark job that has use case as below:
RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some
transformation and after that I do a count on transformed data.
Code somewhat looks like this:
RDD3 = RDD1.flatMap(..)
RDD4 = RDD2.flatMap()
In Spark UI I see count() functions are getting called one after another. How
do I make it parallel? I also looked at below discussion from Cloudera, but it
does not show how to run driver functions in parallel. Do I just add Executor
and run them in threads?
<Screen Shot 2017-10-26 at 10.54.51 PM.png>Attaching UI snapshot here?