Re: [Spark 2.x Core] .collect() size limit

Deepak Goel Sat, 28 Apr 2018 09:58:27 -0700

I believe the virtualization of memory happens at the OS layer hiding it
completely from the application layer


On Sat, 28 Apr 2018, 22:22 Stephen Boesch, <java...@gmail.com> wrote:

> While it is certainly possible to use VM I have seen in a number of places
> warnings that collect() results must be able to be fit in memory. I'm not
> sure if that applies to *all" spark calculations: but in the very least
> each of the specific collect()'s that are performed would need to be
> verified.
>
> And maybe *all *collects do require sufficient memory - would you like to
> check the source code to see if there were disk backed collects actually
> happening for some cases?
>
> 2018-04-28 9:48 GMT-07:00 Deepak Goel <deic...@gmail.com>:
>
>> There is something as *virtual memory*
>>
>> On Sat, 28 Apr 2018, 21:19 Stephen Boesch, <java...@gmail.com> wrote:
>>
>>> Do you have a machine with  terabytes of RAM?  afaik collect() requires
>>> RAM - so that would be your limiting factor.
>>>
>>> 2018-04-28 8:41 GMT-07:00 klrmowse <klrmo...@gmail.com>:
>>>
>>>> i am currently trying to find a workaround for the Spark application i
>>>> am
>>>> working on so that it does not have to use .collect()
>>>>
>>>> but, for now, it is going to have to use .collect()
>>>>
>>>> what is the size limit (memory for the driver) of RDD file that
>>>> .collect()
>>>> can work with?
>>>>
>>>> i've been scouring google-search - S.O., blogs, etc, and everyone is
>>>> cautioning about .collect(), but does not specify how huge is huge...
>>>> are we
>>>> talking about a few gigabytes? terabytes?? petabytes???
>>>>
>>>>
>>>>
>>>> thank you
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>>
>>>>
>>>
>

Re: [Spark 2.x Core] .collect() size limit

Reply via email to