Re: memory vs data_size
Only fit the data in memory where you want to run the iterative algorithm For map-reduce operations, it's better not to cache if you have a memory crunch... Also schedule the persist and unpersist such that you utilize the RAM well... On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei wrote: > Hi, > > By default, 60% of JVM memory is reserved for RDD caching, so in your > case, 72GB memory is available for RDDs which means that your total data > may fit in memory. You can check the RDD memory statistics via the storage > tab in web ui. > > Hope this helps! > Liquan > > > > On Tue, Sep 30, 2014 at 4:11 PM, anny9699 wrote: > >> Hi, >> >> Is there a guidance about for a data of certain data size, how much total >> memory should be needed to achieve a relatively good speed? >> >> I have a data of around 200 GB and the current total memory for my 8 >> machines are around 120 GB. Is that too small to run the data of this big? >> Even the read in and simple initial processing seems to last forever. >> >> Thanks a lot! >> >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> >> > > > -- > Liquan Pei > Department of Physics > University of Massachusetts Amherst >
Re: memory vs data_size
Hi, By default, 60% of JVM memory is reserved for RDD caching, so in your case, 72GB memory is available for RDDs which means that your total data may fit in memory. You can check the RDD memory statistics via the storage tab in web ui. Hope this helps! Liquan On Tue, Sep 30, 2014 at 4:11 PM, anny9699 wrote: > Hi, > > Is there a guidance about for a data of certain data size, how much total > memory should be needed to achieve a relatively good speed? > > I have a data of around 200 GB and the current total memory for my 8 > machines are around 120 GB. Is that too small to run the data of this big? > Even the read in and simple initial processing seems to last forever. > > Thanks a lot! > > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > > -- Liquan Pei Department of Physics University of Massachusetts Amherst
memory vs data_size
Hi, Is there a guidance about for a data of certain data size, how much total memory should be needed to achieve a relatively good speed? I have a data of around 200 GB and the current total memory for my 8 machines are around 120 GB. Is that too small to run the data of this big? Even the read in and simple initial processing seems to last forever. Thanks a lot! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org