Re: Maximum Core Utilization
Also, if not already done, you may want to try repartition your data to 50 partition s On 6 May 2015 05:56, Manu Kaul manohar.k...@gmail.com wrote: Hi All, For a job I am running on Spark with a dataset of say 350,000 lines (not big), I am finding that even though my cluster has a large number of cores available (like 100 cores), the Spark system seems to stop after using just 4 cores and after that the runtime is pretty much a straight line no matter how many more cores are thrown at it. I am wondering if Spark tries to figure out the maximum no. of cores to use based on the size of the dataset? If yes, is there a way to disable this feature and force it to use all the cores available? Thanks, Manu -- The greater danger for most of us lies not in setting our aim too high and falling short; but in setting our aim too low, and achieving our mark. - Michelangelo
Maximum Core Utilization
Hi All, For a job I am running on Spark with a dataset of say 350,000 lines (not big), I am finding that even though my cluster has a large number of cores available (like 100 cores), the Spark system seems to stop after using just 4 cores and after that the runtime is pretty much a straight line no matter how many more cores are thrown at it. I am wondering if Spark tries to figure out the maximum no. of cores to use based on the size of the dataset? If yes, is there a way to disable this feature and force it to use all the cores available? Thanks, Manu -- The greater danger for most of us lies not in setting our aim too high and falling short; but in setting our aim too low, and achieving our mark. - Michelangelo
Re: Maximum Core Utilization
Hi, do you have information on how many partitions/tasks the stage/job is running? By default there is 1 core per task, and your number of concurrent tasks may be limiting core utilization. There are a few settings you could play with, assuming your issue is related to the above: spark.default.parallelism spark.cores.max spark.task.cpus On Tue, May 5, 2015 at 3:55 PM, Manu Kaul manohar.k...@gmail.com wrote: Hi All, For a job I am running on Spark with a dataset of say 350,000 lines (not big), I am finding that even though my cluster has a large number of cores available (like 100 cores), the Spark system seems to stop after using just 4 cores and after that the runtime is pretty much a straight line no matter how many more cores are thrown at it. I am wondering if Spark tries to figure out the maximum no. of cores to use based on the size of the dataset? If yes, is there a way to disable this feature and force it to use all the cores available? Thanks, Manu -- The greater danger for most of us lies not in setting our aim too high and falling short; but in setting our aim too low, and achieving our mark. - Michelangelo