That looks roughly right, though you will want to mark Spark
dependencies as provided. Do you need netlib directly?
Pyspark won't matter here if you're in Scala; what's installed with
pip would not matter in any event.
On Tue, Aug 25, 2020 at 3:30 AM Aviad Klein wrote:
>
> Hey Chris and Sean,
Hey Chris and Sean, thanks for taking the time to answer.
Perhaps my installation of pyspark is off, although I did use version 2.4.4
When developing in scala and pyspark how do you setup your environment?
I used sbt for scala spark
libraryDependencies ++= Seq(
"org.apache.spark" %%
Hi Luca,
Thanks for sharing the feedback. We'll include these recommendations in our
tests. However, we feel the issue that we're seeing right now is due to the
difference in size of data downloaded from storage by the executors. In case of
S3, executors are downloading almost 50 GB of data