Check the number of partitions in your input. It may be much less than the available parallelism of your small cluster. For example, input that lives in just 1 partition will spawn just 1 task.
Beyond that parallelism just happens. You can see the parallelism of each operation in the Spark UI. On Thu, Jan 15, 2015 at 10:53 PM, Wang, Ningjun (LNG-NPV) <[email protected]> wrote: > Spark Standalone cluster. > > My program is running very slow, I suspect it is not doing parallel > processing of rdd. How can I force it to run parallel? Is there anyway to > check whether it is processed in parallel? > > Regards, > > Ningjun Wang > Consulting Software Engineer > LexisNexis > 121 Chanlon Road > New Providence, NJ 07974-1541 > > > -----Original Message----- > From: Sean Owen [mailto:[email protected]] > Sent: Thursday, January 15, 2015 4:29 PM > To: Wang, Ningjun (LNG-NPV) > Cc: [email protected] > Subject: Re: How to force parallel processing of RDD using multiple thread > > What is your cluster manager? For example on YARN you would specify > --executor-cores. Read: > http://spark.apache.org/docs/latest/running-on-yarn.html > > On Thu, Jan 15, 2015 at 8:54 PM, Wang, Ningjun (LNG-NPV) > <[email protected]> wrote: >> I have a standalone spark cluster with only one node with 4 CPU cores. >> How can I force spark to do parallel processing of my RDD using >> multiple threads? For example I can do the following >> >> >> >> Spark-submit --master local[4] >> >> >> >> However I really want to use the cluster as follow >> >> >> >> Spark-submit --master spark://10.125.21.15:7070 >> >> >> >> In that case, how can I make sure the RDD is processed with multiple >> threads/cores? >> >> >> >> Thanks >> >> Ningjun >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
