I did some instrumentation to figure out traces of where DirectByteBuffers are being created and it turns out that setting the following system properties in addition to setting spark.shuffle.io.preferDirectBufs=false in spark config:
io.netty.noUnsafe=true io.netty.threadLocalDirectBufferSize=0 This should force netty to mostly use on heap buffers and thus increases the stability of spark jobs that perform a lot of shuffle. I have created the defect SPARK-18787 to either force these settings when spark.shuffle.io.preferDirectBufs=false is set in spark config or document it. Hope it will be helpful for other users as well. Thanks, Aniket On Sat, Nov 26, 2016 at 3:31 PM Koert Kuipers <ko...@tresata.com> wrote: > i agree that offheap memory usage is unpredictable. > > when we used rdds the memory was mostly on heap and total usage > predictable, and we almost never had yarn killing executors. > > now with dataframes the memory usage is both on and off heap, and we have > no way of limiting the off heap memory usage by spark, yet yarn requires a > maximum total memory usage and if you go over it yarn kills the executor. > > On Fri, Nov 25, 2016 at 12:14 PM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > > Thanks Rohit, Roddick and Shreya. I tried > changing spark.yarn.executor.memoryOverhead to be 10GB and lowering > executor memory to 30 GB and both of these didn't work. I finally had to > reduce the number of cores per executor to be 18 (from 36) in addition to > setting higher spark.yarn.executor.memoryOverhead and lower executor memory > size. I had to trade off performance for reliability. > > Unfortunately, spark does a poor job reporting off heap memory usage. From > the profiler, it seems that the job's heap usage is pretty static but the > off heap memory fluctuates quiet a lot. It looks like bulk of off heap is > used by io.netty.buffer.UnpooledUnsafeDirectByteBuf while the shuffle > client is trying to read block from shuffle service. It looks > like org.apache.spark.network.util.TransportFrameDecoder retains them > in buffers field while decoding responses from the shuffle service. So far, > it's not clear why it needs to hold multiple GBs in the buffers. Perhaps > increasing the number of partitions may help with this. > > Thanks, > Aniket > > On Fri, Nov 25, 2016 at 1:09 AM Shreya Agarwal <shrey...@microsoft.com> > wrote: > > I don’t think it’s just memory overhead. It might be better to use an > execute with lesser heap space(30GB?). 46 GB would mean more data load into > memory and more GC, which can cause issues. > > > > Also, have you tried to persist data in any way? If so, then that might be > causing an issue. > > > > Lastly, I am not sure if your data has a skew and if that is forcing a lot > of data to be on one executor node. > > > > Sent from my Windows 10 phone > > > > *From: *Rodrick Brown <rodr...@orchardplatform.com> > *Sent: *Friday, November 25, 2016 12:25 AM > *To: *Aniket Bhatnagar <aniket.bhatna...@gmail.com> > *Cc: *user <user@spark.apache.org> > *Subject: *Re: OS killing Executor due to high (possibly off heap) memory > usage > > > Try setting spark.yarn.executor.memoryOverhead 10000 > > On Thu, Nov 24, 2016 at 11:16 AM, Aniket Bhatnagar < > aniket.bhatna...@gmail.com> wrote: > > Hi Spark users > > I am running a job that does join of a huge dataset (7 TB+) and the > executors keep crashing randomly, eventually causing the job to crash. > There are no out of memory exceptions in the log and looking at the dmesg > output, it seems like the OS killed the JVM because of high memory usage. > My suspicion is towards off heap usage of executor is causing this as I am > limiting the on heap usage of executor to be 46 GB and each host running > the executor has 60 GB of RAM. After the executor crashes, I can see that > the external shuffle manager > (org.apache.spark.network.server.TransportRequestHandler) logs a lot of > channel closed exceptions in yarn node manager logs. This leads me to > believe that something triggers out of memory during shuffle read. Is there > a configuration to completely disable usage of off heap memory? I have > tried setting spark.shuffle.io.preferDirectBufs=false but the executor is > still getting killed by the same error. > > Cluster details: > 10 AWS c4.8xlarge hosts > RAM on each host - 60 GB > Number of cores on each host - 36 > Additional hard disk on each host - 8 TB > > Spark configuration: > dynamic allocation enabled > external shuffle service enabled > spark.driver.memory 1024M > spark.executor.memory 47127M > Spark master yarn-cluster > > Sample error in yarn node manager: > 2016-11-24 10:34:06,507 ERROR > org.apache.spark.network.server.TransportRequestHandler > (shuffle-server-50): Error sending result > ChunkFetchSuccess{streamChunkId=StreamChunkId{streamId=919299554123, > chunkIndex=0}, > buffer=FileSegmentManagedBuffer{file=/mnt3/yarn/usercache/hadoop/appcache/application_1479898345621_0006/blockmgr-ad5301a9-e1e9-4723-a8c4-9276971b2259/2c/shuffle_3_963_0.data, > offset=0, length=669014456}} to /10.192.108.170:52782; closing connection > java.nio.channels.ClosedChannelException > > Error in dmesg: > [799873.309897] Out of memory: Kill process 50001 (java) score 927 or > sacrifice child > [799873.314439] Killed process 50001 (java) total-vm:65652448kB, > anon-rss:57246528kB, file-rss:0kB > > Thanks, > Aniket > > > > > -- > > [image: Orchard Platform] <http://www.orchardplatform.com/> > > *Rodrick Brown */ *DevOPs* > > 9174456839 / rodr...@orchardplatform.com > > Orchard Platform > 101 5th Avenue, 4th Floor, New York, NY > > *NOTICE TO RECIPIENTS*: This communication is confidential and intended > for the use of the addressee only. If you are not an intended recipient of > this communication, please delete it immediately and notify the sender by > return email. Unauthorized reading, dissemination, distribution or copying > of this communication is prohibited. This communication does not constitute > an offer to sell or a solicitation of an indication of interest to purchase > any loan, security or any other financial product or instrument, nor is it > an offer to sell or a solicitation of an indication of interest to purchase > any products or services to any persons who are prohibited from receiving > such information under applicable law. The contents of this communication > may not be accurate or complete and are subject to change without notice. > As such, Orchard App, Inc. (including its subsidiaries and affiliates, > "Orchard") makes no representation regarding the accuracy or completeness > of the information contained herein. The intended recipient is advised to > consult its own professional advisors, including those specializing in > legal, tax and accounting matters. Orchard does not provide legal, tax or > accounting advice. > > >