Hi Supun,
A couple of things with regard to your question.
--executor-cores means the number of worker threads per VM. According to
your requirement this should be set to 8.
*repartitionAndSortWithinPartitions *is a RDD operation, RDD operations in
Spark are not performant both in terms of execu
Hi, Apache Spark PMC members.
Can we cut Apache Spark 2.4.4 next Monday (22nd July)?
Bests,
Dongjoon.
On Fri, Jul 12, 2019 at 3:18 PM Dongjoon Hyun
wrote:
> Thank you, Jacek.
>
> BTW, I added `@private` since we need PMC's help to make an Apache Spark
> release.
>
> Can I get more feedbacks f
Hi All,
Could you please help me to fix the below issue using spark 2.4 , scala 2.12
How do we extract's the multiple values in the given file name pattern using
spark/scala regular expression.please
give me some idea on the below approach.
object Driver {
private val filePattern =
xyzabc_so
Hi all,
We are trying to measure the sorting performance of Spark. We have a 16
node cluster with 48 cores and 256GB of ram in each machine and 10Gbps
network.
Let's say we are running with 128 parallel tasks and each partition
generates about 1GB of data (total 128GB).
We are using the method *
Hi all,
Forgive this naïveté, I'm looking for reassurance from some experts!
In the past we created a tailored Spark library for our organisation,
implementing Spark functions in Scala with Python and R "wrappers" on top, but
the focus on Scala has alienated our analysts/statisticians/data scie