RDD caching, memory network input

2015-01-28 Thread Andrianasolo Fanilo
Hello Spark fellows :), I think I need some help to understand how .cache and task input works within a job. I have an 7 GB input matrix in HDFS that I load using .textFile(). I also have a config file which contains an array of 12 Logistic Regression Model parameters, loaded as an

Re: RDD caching, memory network input

2015-01-28 Thread Sandy Ryza
Hi Fanilo, How many cores are you using per executor? Are you aware that you can combat the container is running beyond physical memory limits error by bumping the spark.yarn.executor.memoryOverhead property? Also, are you caching the parsed version or the text? -Sandy On Wed, Jan 28, 2015 at

RE: RDD caching, memory network input

2015-01-28 Thread Andrianasolo Fanilo
= PredictionReader.getFeatures(…).cache Where getFeatures() loads the file then parses it. De : Sandy Ryza [mailto:sandy.r...@cloudera.com] Envoyé : mercredi 28 janvier 2015 17:12 À : Andrianasolo Fanilo Cc : user@spark.apache.org Objet : Re: RDD caching, memory network input Hi Fanilo, How many