How to efficiently manage resources across a cluster and avoid GC overhead exceeded errors?

ioannis.deligiannis Mon, 18 Nov 2013 05:13:44 -0800

Hi,

I have a cluster of 20 servers, each having 24 cores and 30GB of RAM allocated 
to Spark. Spark runs in a STANDALONE mode.
I am trying to load some 200+GB files and cache the rows using ".cache()".


What I would like to do is the following: (ATM from the scala console)
-Evenly load the files across the 20 servers (preferably using all 20*24 cores 
for the load)
-Verify that data are loaded as NODE_LOCAL
Looking into the :4040 console, I see in some runs a lot of NODE_LOCAL but in 
others a lot of ANY. Is there a way to identify what is that TID doing in ANY

If I allocate less than ~double the memory I need, I get an OutOfMemory error.

If I use the textFile (int) parameter,

*         i.e. "sc.textFile("hdfs://...",20)
Then the error goes away.

On the other hand, if I allocate enough memory, I can see from the admin 
console that some of my workers have too much load and some other less than 
half. I understand that I could use a partitioner to balance my data but I 
wouldn't expect an OOME if nodes are significantly under-used. Am I missing 
something?

Thanks,

Ioannis Deligiannis


_______________________________________________

This message is for information purposes only, it is not a recommendation, 
advice, offer or solicitation to buy or sell a product or service nor an 
official confirmation of any transaction. It is directed at persons who are 
professionals and is not intended for retail customer use. Intended for 
recipient only. This message is subject to the terms at: 
www.barclays.com/emaildisclaimer.

For important disclosures, please see: 
www.barclays.com/salesandtradingdisclaimer regarding market commentary from 
Barclays Sales and/or Trading, who are active market participants; and in 
respect of Barclays Research, including disclosures relating to specific 
issuers, please see http://publicresearch.barclays.com.

_______________________________________________

How to efficiently manage resources across a cluster and avoid GC overhead exceeded errors?

Reply via email to