Re: Out of memory exception in MLlib's naive baye's classification training

Xiangrui Meng Mon, 22 Sep 2014 12:44:12 -0700

Does feature size 43839 equal to the number of terms? Check the output
dimension of your feature vectorizer and reduce number of partitions
to match the number of physical cores. I saw you set
spark.storage.memoryFaction to 0.0. Maybe it is better to keep the
default. Also please confirm the driver memory in the executor tab of
Spark WebUI. -Xiangrui


On Mon, Sep 22, 2014 at 5:48 AM, jatinpreet <jatinpr...@gmail.com> wrote:
> Hi, I have been facing an unusual issue with Naive Baye's training. I run
> out of heap space with even with limited data during training phase. I am
> trying to run the same on a rudimentary cluster of two development machines
> in standalone mode. I am reading data from an HBase table, converting them
> into TFIDF vectors and then feeding the vectors to Naive Baye's training
> API. The out of memory exception occurs while training. The strange part is
> that I am able to training with much more data when I read the documents
> from my local disk on the driver system. To give an idea, following are my
> configuration settings and feature size, Machines on cluster: 2
> Cores:12(8+4) Total executor memory: 13GB(6.5+6.5) Executor Memory: 6GB
> Driver Memory: 4GB Feature Size: 43839 Categories/Labels:20 Parallelism:100
> spark.storage.memoryFraction: 0.0 spark.shuffle.memoryFraction: 0.8 Total
> text data size on disk: 13.2 MB Please help me in solving the issue. I am
> able to run training on same systems with Mahout and that is unnerving for
> me. I can't use Hashing TF available with Spark due to the resultant
> decrease in accuracy, but with this feature size, I expect Spark to run
> easily. Thanks, Jatin
> Novice Big Data Programmer
>
> ________________________________
> View this message in context: Out of memory exception in MLlib's naive
> baye's classification training
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: Out of memory exception in MLlib's naive baye's classification training

Reply via email to