That configuration looks bad. With only two cores in use and 1GB used by
the app. Few points-

1. Please oversubscribe those CPUs to at-least twice the amount of cores
you have to start-with and then tune if it freezes
2. Allocate all of the CPU cores and memory to your running app (I assume
it is your test environment)
3. Assuming that you are running a quad core machine if you define cores as
8 for your workers you will get 56 cores (CPU threads)
4. Also, it depends on the source from where you are reading the data. If
you are reading from HDFS, what is your block size and part count?
5. You may also have to tune the timeouts and frame-size based on the
dataset and errors that you are facing

We have run terasort with couple of high-end worker machines RW from HDFS
with 5-10 mount points allocated for HDFS and Spark local. We have used
multiple configuration, like-
10W-10CPU-10GB, 25W-6CPU-6GB running on each of the two machines with HDFS
512MB blocks and 1000-2000 parts. All these guys chatting at 10Gbe, worked
well.

On Tue, Jul 12, 2016 at 3:39 AM, Kartik Mathur <kar...@bluedata.com> wrote:

> I am trying a run terasort in spark , for a 7 node cluster with only 10g
> of data and executors get lost with GC overhead limit exceeded error.
>
> This is what my cluster looks like -
>
>
>    - *Alive Workers:* 7
>    - *Cores in use:* 28 Total, 2 Used
>    - *Memory in use:* 56.0 GB Total, 1024.0 MB Used
>    - *Applications:* 1 Running, 6 Completed
>    - *Drivers:* 0 Running, 0 Completed
>    - *Status:* ALIVE
>
> Each worker has 8 cores and 4GB memory.
>
> My questions is how do people running in production decide these
> properties -
>
> 1) --num-executors
> 2) --executor-cores
> 3) --executor-memory
> 4) num of partitions
> 5) spark.default.parallelism
>
> Thanks,
> Kartik
>
>
>

Reply via email to