Only fit the data in memory where you want to run the iterative
algorithm....

For map-reduce operations, it's better not to cache if you have a memory
crunch...

Also schedule the persist and unpersist such that you utilize the RAM
well...

On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei <liquan...@gmail.com> wrote:

> Hi,
>
> By default, 60% of JVM memory is reserved for RDD caching, so in your
> case, 72GB memory is available for RDDs which means that your total data
> may fit in memory. You can check the RDD memory statistics via the storage
> tab in web ui.
>
> Hope this helps!
> Liquan
>
>
>
> On Tue, Sep 30, 2014 at 4:11 PM, anny9699 <anny9...@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a guidance about for a data of certain data size, how much total
>> memory should be needed to achieve a relatively good speed?
>>
>> I have a data of around 200 GB and the current total memory for my 8
>> machines are around 120 GB. Is that too small to run the data of this big?
>> Even the read in and simple initial processing seems to last forever.
>>
>> Thanks a lot!
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Liquan Pei
> Department of Physics
> University of Massachusetts Amherst
>

Reply via email to