Thanks, Keith. we have set the SPARK_WORKER_INSTANCES=8. So that means we
are running 8 workers in a single machine with 1 thread and this gives the
8 threads?
Is there a preference for running 1 worker and 8 threads inside it? These
are dual CPU machines, so I believe we at least need 2 worker in
Hi Supun,
A couple of things with regard to your question.
--executor-cores means the number of worker threads per VM. According to
your requirement this should be set to 8.
*repartitionAndSortWithinPartitions *is a RDD operation, RDD operations in
Spark are not performant both in terms of execu
Hi all,
We are trying to measure the sorting performance of Spark. We have a 16
node cluster with 48 cores and 256GB of ram in each machine and 10Gbps
network.
Let's say we are running with 128 parallel tasks and each partition
generates about 1GB of data (total 128GB).
We are using the method *