subject:"Re\: Out of memory exception in MLlib's naive baye's classification training"

Re: Out of memory exception in MLlib's naive baye's classification training

2015-08-20 Thread minerva

Hallo, I used Mahout for Text Classification and now I'm trying with Spark. I had the same Problem training Bayes with (only) 569 Documents. I solved doing htf = HashingTF(5000) instead of htf = HashingTF() [default Features Space 2^20). I don't know if it can be considered a longterm Solution (w

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-25 Thread Xiangrui Meng

For the vectorizer, what's the output feature dimension and are you creating sparse vectors or dense vectors? The model on the driver consists of numClasses * numFeatures doubles. However, the driver needs more memory in order to receive the task result (of the same size) from executors. So you nee

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-24 Thread jatinpreet

Hi, I was able to get the training running in local mode with default settings, there was a problem with document labels which were quite large(not 20 as suggested earlier). I am currently training 175000 documents on a single node with 2GB of executor memory and 5GB of driver memory successfull

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-23 Thread jatinpreet

Xiangrui, Thanks for replying. I am using the subset of newsgroup20 data. I will send you the vectorized data for analysis shortly. I have tried running in local mode as well but I get the same OOM exception. I started with 4GB of data but then moved to smaller set to verify that everything was

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-23 Thread Xiangrui Meng

You dataset is small. NaiveBayes should work under the default settings, even in local mode. Could you try local mode first without changing any Spark settings? Since your dataset is small, could you save the vectorized data (RDD[LabeledPoint]) and send me a sample? I want to take a look at the fea

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-23 Thread jatinpreet

I get the following stacktrace if it is of any help. 14/09/23 15:46:02 INFO scheduler.DAGScheduler: failed: Set() 14/09/23 15:46:02 INFO scheduler.DAGScheduler: Missing parents for Stage 7: List() 14/09/23 15:46:02 INFO scheduler.DAGScheduler: Submitting Stage 7 (MapPartitionsRDD[24] at combineByK

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-23 Thread jatinpreet

Xiangrui, Yes, the total number of terms is 43839. I have also tried running it using different values of parallelism ranging from 1/core to 10/core. I also used multiple configurations like setting spark.storage.memoryFaction and spark.shuffle.memoryFraction to default values. The point to note

Re: Out of memory exception in MLlib's naive baye's classification training

2014-09-22 Thread Xiangrui Meng

Does feature size 43839 equal to the number of terms? Check the output dimension of your feature vectorizer and reduce number of partitions to match the number of physical cores. I saw you set spark.storage.memoryFaction to 0.0. Maybe it is better to keep the default. Also please confirm the driver

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

Re: Out of memory exception in MLlib's naive baye's classification training

8 matches

Site Navigation

Mail list logo

Footer information