Spark may take more RAM than reqiured by RDD, can you look at storage
section of Spark & see how much space RDD is taking in memory. It may still
take more storage than disk as Java objects have some overhead.
Consider enabling compression in RDD.

Mayur Rustagi
Ph: +919632149971
h <https://twitter.com/mayur_rustagi>ttp://www.sigmoidanalytics.com
https://twitter.com/mayur_rustagi



On Tue, Feb 25, 2014 at 6:47 AM, Suraj Satishkumar Sheth <suraj...@adobe.com
> wrote:

>  Hi All,
>
> I have a folder in HDFS which has files with size of 47GB. I am loading
> this in Spark as RDD[String] and caching it. The total amount of RAM that
> Spark uses to cache it is around 97GB. I want to know why Spark is taking
> up so much of Space for the RDD? Can we reduce the RDD size in Spark and
> make it similar to it’s size on disk?
>
>
>
> Thanks and Regards,
>
> Suraj Sheth
>

Reply via email to