Hi Sean, thx for the tip. I'm just running my app via spark-submit on CLI ie >spark-submit --class X --master local[*] assembly.jar so I'll now add to CLI args ie: spark-submit --class X --master local[*] --driver-memory 8g assembly.jar etc.
Unless I have this wrong? Thx On Thu, Jul 1, 2021 at 1:43 PM Sean Owen <sro...@gmail.com> wrote: > You need to set driver memory before the driver starts, on the CLI or > however you run your app, not in the app itself. By the time the driver > starts to run your app, its heap is already set. > > On Thu, Jul 1, 2021 at 12:10 AM javaguy Java <javagu...@gmail.com> wrote: > >> Hi, >> >> I'm getting Java OOM errors even though I'm setting my driver memory to >> 24g and I'm executing against local[*] >> >> I was wondering if anyone can give me any insight. The server this job is >> running on has more than enough memory as does the spark driver. >> >> The final result does write 3 csv files that are 300MB each so there's no >> way its coming close to the 24g >> >> From the OOM, I don't know about the internals of Spark itself to tell me >> where this is failing + how I should refactor or change anything >> >> Would appreciate any advice on how I can resolve >> >> Thx >> >> >> Parameters here: >> >> val spark = SparkSession >> .builder >> .master("local[*]") >> .appName("OOM") >> .config("spark.driver.host", "localhost") >> .config("spark.driver.maxResultSize", "0") >> .config("spark.sql.caseSensitive", "false") >> .config("spark.sql.adaptive.enabled", "true") >> .config("spark.sql.adaptive.coalescePartitions.enabled", "true") >> .config("spark.driver.memory", "24g") >> .getOrCreate() >> >> >> My OOM errors are below: >> >> driver): java.lang.OutOfMemoryError: Java heap space >> at java.io.BufferedOutputStream.<init>(BufferedOutputStream.java:76) >> at >> org.apache.spark.storage.DiskBlockObjectWriter$ManualCloseBufferedOutputStream$1.<init>(DiskBlockObjectWriter.scala:109) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:110) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:118) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245) >> at >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158) >> at >> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) >> at org.apache.spark.scheduler.Task.run(Task.scala:127) >> at >> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >> at >> org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/1058609963.apply(Unknown >> Source) >> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >> >> >> driver): java.lang.OutOfMemoryError: Java heap space >> at >> net.jpountz.lz4.LZ4BlockOutputStream.<init>(LZ4BlockOutputStream.java:102) >> at >> org.apache.spark.io.LZ4CompressionCodec.compressedOutputStream(CompressionCodec.scala:145) >> at >> org.apache.spark.serializer.SerializerManager.wrapForCompression(SerializerManager.scala:158) >> at >> org.apache.spark.serializer.SerializerManager.wrapStream(SerializerManager.scala:133) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:122) >> at >> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:245) >> at >> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:158) >> at >> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99) >> at >> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52) >> at org.apache.spark.scheduler.Task.run(Task.scala:127) >> at >> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:446) >> at >> org.apache.spark.executor.Executor$TaskRunner$$Lambda$1792/249605067.apply(Unknown >> Source) >> at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377) >> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449) >> at >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) >> at java.lang.Thread.run(Thread.java:748) >> >> >>