Does feature size 43839 equal to the number of terms? Check the output
dimension of your feature vectorizer and reduce number of partitions
to match the number of physical cores. I saw you set
spark.storage.memoryFaction to 0.0. Maybe it is better to keep the
default. Also please confirm the driver memory in the executor tab of
Spark WebUI. -Xiangrui

On Mon, Sep 22, 2014 at 5:48 AM, jatinpreet <jatinpr...@gmail.com> wrote:
> Hi, I have been facing an unusual issue with Naive Baye's training. I run
> out of heap space with even with limited data during training phase. I am
> trying to run the same on a rudimentary cluster of two development machines
> in standalone mode. I am reading data from an HBase table, converting them
> into TFIDF vectors and then feeding the vectors to Naive Baye's training
> API. The out of memory exception occurs while training. The strange part is
> that I am able to training with much more data when I read the documents
> from my local disk on the driver system. To give an idea, following are my
> configuration settings and feature size, Machines on cluster: 2
> Cores:12(8+4) Total executor memory: 13GB(6.5+6.5) Executor Memory: 6GB
> Driver Memory: 4GB Feature Size: 43839 Categories/Labels:20 Parallelism:100
> spark.storage.memoryFraction: 0.0 spark.shuffle.memoryFraction: 0.8 Total
> text data size on disk: 13.2 MB Please help me in solving the issue. I am
> able to run training on same systems with Mahout and that is unnerving for
> me. I can't use Hashing TF available with Spark due to the resultant
> decrease in accuracy, but with this feature size, I expect Spark to run
> easily. Thanks, Jatin
> Novice Big Data Programmer
>
> ________________________________
> View this message in context: Out of memory exception in MLlib's naive
> baye's classification training
> Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to