I would say use the dynamic allocation rather than number of executors.
Provide some executor memory which you would like.
Deciding the values requires couple of test runs and checking what works
best for you.

You could try something like -

--driver-memory 1G \
--executor-memory 2G \
--executor-cores 2 \
--conf spark.dynamicAllocation.enabled=true \
--conf spark.dynamicAllocation.initialExecutors=8 \



On Tue, Jul 12, 2016 at 1:27 PM, Anuj Kumar <anujs...@gmail.com> wrote:

> That configuration looks bad. With only two cores in use and 1GB used by
> the app. Few points-
>
> 1. Please oversubscribe those CPUs to at-least twice the amount of cores
> you have to start-with and then tune if it freezes
> 2. Allocate all of the CPU cores and memory to your running app (I assume
> it is your test environment)
> 3. Assuming that you are running a quad core machine if you define cores
> as 8 for your workers you will get 56 cores (CPU threads)
> 4. Also, it depends on the source from where you are reading the data. If
> you are reading from HDFS, what is your block size and part count?
> 5. You may also have to tune the timeouts and frame-size based on the
> dataset and errors that you are facing
>
> We have run terasort with couple of high-end worker machines RW from HDFS
> with 5-10 mount points allocated for HDFS and Spark local. We have used
> multiple configuration, like-
> 10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with HDFS
> 512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe, worked
> well.
>
> On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <kar...@bluedata.com>
> wrote:
>
>> I am trying a run terasort in spark , for a 7 node cluster with only 10g
>> of data and executors get lost with GC overhead limit exceeded error.
>>
>> This is what my cluster looks like -
>>
>>
>>    - *Alive Workers:* 7
>>    - *Cores in use:* 28 Total, 2 Used
>>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>>    - *Applications:* 1 Running, 6 Completed
>>    - *Drivers:* 0 Running, 0 Completed
>>    - *Status:* ALIVE
>>
>> Each worker has 8 cores and 4GB memory.
>>
>> My questions is how do people running in production decide these
>> properties -
>>
>> 1) --num-executors
>> 2) --executor-cores
>> 3) --executor-memory
>> 4) num of partitions
>> 5) spark.default.parallelism
>>
>> Thanks,
>> Kartik
>>
>>
>>
>

Reply via email to