Only fit the data in memory where you want to run the iterative algorithm....
For map-reduce operations, it's better not to cache if you have a memory crunch... Also schedule the persist and unpersist such that you utilize the RAM well... On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei <liquan...@gmail.com> wrote: > Hi, > > By default, 60% of JVM memory is reserved for RDD caching, so in your > case, 72GB memory is available for RDDs which means that your total data > may fit in memory. You can check the RDD memory statistics via the storage > tab in web ui. > > Hope this helps! > Liquan > > > > On Tue, Sep 30, 2014 at 4:11 PM, anny9699 <anny9...@gmail.com> wrote: > >> Hi, >> >> Is there a guidance about for a data of certain data size, how much total >> memory should be needed to achieve a relatively good speed? >> >> I have a data of around 200 GB and the current total memory for my 8 >> machines are around 120 GB. Is that too small to run the data of this big? >> Even the read in and simple initial processing seems to last forever. >> >> Thanks a lot! >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Liquan Pei > Department of Physics > University of Massachusetts Amherst >