---------- Forwarded message ---------- From: Liquan Pei <liquan...@gmail.com> Date: Mon, Sep 29, 2014 at 2:12 PM Subject: Re: about partition number To: anny9699 <anny9...@gmail.com>
The number of cores available in your cluster determines the number of tasks that can be run concurrently. If your data is evenly partitioned, the number of partitions should approximately equal to total_coreNumber. Liquan On Mon, Sep 29, 2014 at 2:01 PM, anny9699 <anny9...@gmail.com> wrote: > Hi, > > I read the past posts about partition number, but am still a little > confused > about partitioning strategy. > > I have a cluster with 8 works and 2 cores for each work. Is it true that > the > optimal partition number should be 2-4 * total_coreNumber or should > approximately equal to total_coreNumber? Or it's the task number that > really > determines the speed rather then partition number? > > Thanks a lot! > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/about-partition-number-tp15362.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Liquan Pei Department of Physics University of Massachusetts Amherst -- Liquan Pei Department of Physics University of Massachusetts Amherst