I believe the virtualization of memory happens at the OS layer hiding it completely from the application layer
On Sat, 28 Apr 2018, 22:22 Stephen Boesch, <java...@gmail.com> wrote: > While it is certainly possible to use VM I have seen in a number of places > warnings that collect() results must be able to be fit in memory. I'm not > sure if that applies to *all" spark calculations: but in the very least > each of the specific collect()'s that are performed would need to be > verified. > > And maybe *all *collects do require sufficient memory - would you like to > check the source code to see if there were disk backed collects actually > happening for some cases? > > 2018-04-28 9:48 GMT-07:00 Deepak Goel <deic...@gmail.com>: > >> There is something as *virtual memory* >> >> On Sat, 28 Apr 2018, 21:19 Stephen Boesch, <java...@gmail.com> wrote: >> >>> Do you have a machine with terabytes of RAM? afaik collect() requires >>> RAM - so that would be your limiting factor. >>> >>> 2018-04-28 8:41 GMT-07:00 klrmowse <klrmo...@gmail.com>: >>> >>>> i am currently trying to find a workaround for the Spark application i >>>> am >>>> working on so that it does not have to use .collect() >>>> >>>> but, for now, it is going to have to use .collect() >>>> >>>> what is the size limit (memory for the driver) of RDD file that >>>> .collect() >>>> can work with? >>>> >>>> i've been scouring google-search - S.O., blogs, etc, and everyone is >>>> cautioning about .collect(), but does not specify how huge is huge... >>>> are we >>>> talking about a few gigabytes? terabytes?? petabytes??? >>>> >>>> >>>> >>>> thank you >>>> >>>> >>>> >>>> -- >>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >>>> >>>> >>> >