Does feature size 43839 equal to the number of terms? Check the output dimension of your feature vectorizer and reduce number of partitions to match the number of physical cores. I saw you set spark.storage.memoryFaction to 0.0. Maybe it is better to keep the default. Also please confirm the driver memory in the executor tab of Spark WebUI. -Xiangrui
On Mon, Sep 22, 2014 at 5:48 AM, jatinpreet <jatinpr...@gmail.com> wrote: > Hi, I have been facing an unusual issue with Naive Baye's training. I run > out of heap space with even with limited data during training phase. I am > trying to run the same on a rudimentary cluster of two development machines > in standalone mode. I am reading data from an HBase table, converting them > into TFIDF vectors and then feeding the vectors to Naive Baye's training > API. The out of memory exception occurs while training. The strange part is > that I am able to training with much more data when I read the documents > from my local disk on the driver system. To give an idea, following are my > configuration settings and feature size, Machines on cluster: 2 > Cores:12(8+4) Total executor memory: 13GB(6.5+6.5) Executor Memory: 6GB > Driver Memory: 4GB Feature Size: 43839 Categories/Labels:20 Parallelism:100 > spark.storage.memoryFraction: 0.0 spark.shuffle.memoryFraction: 0.8 Total > text data size on disk: 13.2 MB Please help me in solving the issue. I am > able to run training on same systems with Mahout and that is unnerving for > me. I can't use Hashing TF available with Spark due to the resultant > decrease in accuracy, but with this feature size, I expect Spark to run > easily. Thanks, Jatin > Novice Big Data Programmer > > ________________________________ > View this message in context: Out of memory exception in MLlib's naive > baye's classification training > Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org