Re: memory vs data_size

2014-09-30 Thread Debasish Das
Only fit the data in memory where you want to run the iterative
algorithm

For map-reduce operations, it's better not to cache if you have a memory
crunch...

Also schedule the persist and unpersist such that you utilize the RAM
well...

On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei  wrote:

> Hi,
>
> By default, 60% of JVM memory is reserved for RDD caching, so in your
> case, 72GB memory is available for RDDs which means that your total data
> may fit in memory. You can check the RDD memory statistics via the storage
> tab in web ui.
>
> Hope this helps!
> Liquan
>
>
>
> On Tue, Sep 30, 2014 at 4:11 PM, anny9699  wrote:
>
>> Hi,
>>
>> Is there a guidance about for a data of certain data size, how much total
>> memory should be needed to achieve a relatively good speed?
>>
>> I have a data of around 200 GB and the current total memory for my 8
>> machines are around 120 GB. Is that too small to run the data of this big?
>> Even the read in and simple initial processing seems to last forever.
>>
>> Thanks a lot!
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>> For additional commands, e-mail: user-h...@spark.apache.org
>>
>>
>
>
> --
> Liquan Pei
> Department of Physics
> University of Massachusetts Amherst
>


Re: memory vs data_size

2014-09-30 Thread Liquan Pei
Hi,

By default, 60% of JVM memory is reserved for RDD caching, so in your case,
72GB memory is available for RDDs which means that your total data may fit
in memory. You can check the RDD memory statistics via the storage tab in
web ui.

Hope this helps!
Liquan



On Tue, Sep 30, 2014 at 4:11 PM, anny9699  wrote:

> Hi,
>
> Is there a guidance about for a data of certain data size, how much total
> memory should be needed to achieve a relatively good speed?
>
> I have a data of around 200 GB and the current total memory for my 8
> machines are around 120 GB. Is that too small to run the data of this big?
> Even the read in and simple initial processing seems to last forever.
>
> Thanks a lot!
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


-- 
Liquan Pei
Department of Physics
University of Massachusetts Amherst


memory vs data_size

2014-09-30 Thread anny9699
Hi,

Is there a guidance about for a data of certain data size, how much total
memory should be needed to achieve a relatively good speed?

I have a data of around 200 GB and the current total memory for my 8
machines are around 120 GB. Is that too small to run the data of this big?
Even the read in and simple initial processing seems to last forever.

Thanks a lot!




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/memory-vs-data-size-tp15443.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org