Seems like a question better suited for the Spark mailing list, or the DSE support <however you get DSE support>, not OSS Cassandra.
> On Oct 27, 2017, at 8:14 AM, Thakrar, Jayesh <jthak...@conversantmedia.com> > wrote: > > What you have is sequential and hence sequential processing. > Also Spark/Scala are not parallel programming languages. > But even if they were, statements are executed sequentially unless you > exploit the parallel/concurrent execution features. > > Anyway, see if this works: > > val (RDD1, RDD2) = (JavaFunctions.cassandraTable(...), > JavaFunctions.cassandraTable(...)) > > val (RDD3, RDD4) = (RDD1.flatMap(..), RDD2.flatMap(..)) > > > I am hoping that Spark being based on Scala, the behavior below will apply: > scala> var x = 0 > x: Int = 0 > > scala> val (a,b) = (x + 1, x+1) > a: Int = 1 > b: Int = 1 > > > > From: Cassa L <lcas...@gmail.com> > Date: Friday, October 27, 2017 at 1:50 AM > To: Jörn Franke <jornfra...@gmail.com> > Cc: user <u...@spark.apache.org>, <firstname.lastname@example.org> > Subject: Re: Why don't I see my spark jobs running in parallel in > Cassandra/Spark DSE cluster? > > No, I dont use Yarn. This is standalone spark that comes with DataStax > Enterprise version of Cassandra. > > On Thu, Oct 26, 2017 at 11:22 PM, Jörn Franke <jornfra...@gmail.com > <mailto:jornfra...@gmail.com>> wrote: > Do you use yarn ? Then you need to configure the queues with the right > scheduler and method. > > On 27. Oct 2017, at 08:05, Cassa L <lcas...@gmail.com > <mailto:lcas...@gmail.com>> wrote: > > Hi, > I have a spark job that has use case as below: > RRD1 and RDD2 read from Cassandra tables. These two RDDs then do some > transformation and after that I do a count on transformed data. > > Code somewhat looks like this: > > RDD1=JavaFunctions.cassandraTable(...) > RDD2=JavaFunctions.cassandraTable(...) > RDD3 = RDD1.flatMap(..) > RDD4 = RDD2.flatMap() > > RDD3.count > RDD4.count > > In Spark UI I see count() functions are getting called one after another. How > do I make it parallel? I also looked at below discussion from Cloudera, but > it does not show how to run driver functions in parallel. Do I just add > Executor and run them in threads? > > https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Getting-Spark-stages-to-run-in-parallel-inside-an-application/td-p/38515 > > <https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Getting-Spark-stages-to-run-in-parallel-inside-an-application/td-p/38515> > > <Screen Shot 2017-10-26 at 10.54.51 PM.png>Attaching UI snapshot here? > > > Thanks. > LCassa >