Could you please help us and provide the source which says about the general guidelines (80-85)?
Even if there is a general guideline, it is probably to keep the performance of Spark application high (And to *distinguish* it from Hadoop). But if you are not too concerned about the *performance* hit from memory to disk, then you could use virtual memory to your advantage. Infact I think the OS could do a pretty good job of data management by keeping only the necessary data in RAM and at the same time having no hard-limit (It would be great to have benchmarks if anyone has done any test before) Also we should *tread* carefully when applying general guidelines to problems. They might not be *relevant* at all. Deepak "Please stop cruelty to Animals, help by becoming a Vegan" +91 73500 12833 deic...@gmail.com Facebook: https://www.facebook.com/deicool LinkedIn: www.linkedin.com/in/deicool "Plant a Tree, Go Green" Made In India : http://www.makeinindia.com/home On Mon, Apr 30, 2018 at 9:06 PM, Lalwani, Jayesh < jayesh.lalw...@capitalone.com> wrote: > Although there is such a thing as virtualization of memory done at the OS > layer, JVM imposes it’s own limit that is controlled by the > *spark.executor.memory > *and *spark.driver.memory* configurations. The amount of memory allocated > by JVM will be controlled by those parameters. General guidelines say that > executor and driver memory should be kept at 80-85% of available RAM. So, > if general guidelines are followed, **virtual memory** is moot. > > *From: *Deepak Goel <deic...@gmail.com> > *Date: *Saturday, April 28, 2018 at 12:58 PM > *To: *Stephen Boesch <java...@gmail.com> > *Cc: *klrmowse <klrmo...@gmail.com>, "user @spark" <user@spark.apache.org> > *Subject: *Re: [Spark 2.x Core] .collect() size limit > > > > I believe the virtualization of memory happens at the OS layer hiding it > completely from the application layer > > > > On Sat, 28 Apr 2018, 22:22 Stephen Boesch, <java...@gmail.com> wrote: > > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must be able to be fit in memory. I'm not > sure if that applies to *all" spark calculations: but in the very least > each of the specific collect()'s that are performed would need to be > verified. > > > > And maybe *all *collects do require sufficient memory - would you like to > check the source code to see if there were disk backed collects actually > happening for some cases? > > > > 2018-04-28 9:48 GMT-07:00 Deepak Goel <deic...@gmail.com>: > > There is something as *virtual memory* > > > > On Sat, 28 Apr 2018, 21:19 Stephen Boesch, <java...@gmail.com> wrote: > > Do you have a machine with terabytes of RAM? afaik collect() requires > RAM - so that would be your limiting factor. > > > > 2018-04-28 8:41 GMT-07:00 klrmowse <klrmo...@gmail.com>: > > i am currently trying to find a workaround for the Spark application i am > working on so that it does not have to use .collect() > > but, for now, it is going to have to use .collect() > > what is the size limit (memory for the driver) of RDD file that .collect() > can work with? > > i've been scouring google-search - S.O., blogs, etc, and everyone is > cautioning about .collect(), but does not specify how huge is huge... are > we > talking about a few gigabytes? terabytes?? petabytes??? > > > > thank you > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > <https://urldefense.proofpoint.com/v2/url?u=http-3A__apache-2Dspark-2Duser-2Dlist.1001560.n3.nabble.com_&d=DwMFaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=F2RNeGILvLdBxn7RJ4effes_QFIiEsoVM2rPi9qX1DKow5HQSjq0_WhIW109SXQ4&m=5LYtB_tQbPNzr4wqcwOP6XqPSef2zJRufNimgqXUCYA&s=iXh4776YwilYUo2ouANkz0T-Gn6uOli8kqYrR1Lr_2o&e=> > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > > > > > > ------------------------------ > > The information contained in this e-mail is confidential and/or > proprietary to Capital One and/or its affiliates and may only be used > solely in performance of work or services for Capital One. The information > transmitted herewith is intended only for use by the individual or entity > to which it is addressed. If the reader of this message is not the intended > recipient, you are hereby notified that any review, retransmission, > dissemination, distribution, copying or other use of, or taking of any > action in reliance upon this information is strictly prohibited. If you > have received this communication in error, please contact the sender and > delete the material from your computer. >