I am trying to understand how Spark handles memory caching.

I can see that when application starts, each worker registers a block manager 
with 66% of the memory assigned to that JVM
>Registering block manager server.xxx:44532 with 3.2 GB RAM

When I run my application and cache some RDDs I can see:
>Added rdd_4_42 in memory on yyy.xxx:35527 (size: 454.8 MB, free: 2.7 GB)

>From the Scala console application UI I can see that some of the block 
>managers have 2 RDD blocks, some 1 and some 0.
I assume that the distribution of blocks is driven by the physical location of 
the data in HDFS (As I can see mainly NODE_LOCAL operations).

Although in many cases I can see how this boosts performance, I cannot see how 
I can make use of my full distributed memory. When a worker gets full, I would 
expect that the next blocks would be sent over the network to other workers 
available RAM. In practice I do not see this happening. Instead, I mostly get 
GC overhead exceeded.

On that note, is there a way to instruct a block manager to trigger an eviction 
at 90% capacity and evict e.g. 50% ? This is required because when for some 
reason the block manager starts evicting blocks, the latency will become 
enormous as it will release/gc/allocate (in a full large heaps that takes 
minutes as it is done iteratively).

Thanks,

Ioannis Deligiannis


_______________________________________________

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

_______________________________________________

Reply via email to