Hello Spark fellows :),
I think I need some help to understand how .cache and task input works within a
job.
I have an 7 GB input matrix in HDFS that I load using .textFile(). I also have
a config file which contains an array of 12 Logistic Regression Model
parameters, loaded as an
Hi Fanilo,
How many cores are you using per executor? Are you aware that you can
combat the container is running beyond physical memory limits error by
bumping the spark.yarn.executor.memoryOverhead property?
Also, are you caching the parsed version or the text?
-Sandy
On Wed, Jan 28, 2015 at
= PredictionReader.getFeatures(…).cache
Where getFeatures() loads the file then parses it.
De : Sandy Ryza [mailto:sandy.r...@cloudera.com]
Envoyé : mercredi 28 janvier 2015 17:12
À : Andrianasolo Fanilo
Cc : user@spark.apache.org
Objet : Re: RDD caching, memory network input
Hi Fanilo,
How many