Xiangrui, Thanks for replying. 

I am using the subset of newsgroup20 data. I will send you the vectorized
data for analysis shortly. 

I have tried running in local mode as well but I get the same OOM exception.
I started with 4GB of data but then moved to smaller set to verify that
everything was fine but I get the error on this small data too. I ultimately
want the system to handle any amount of data we throw at it without OOM
exceptions.

My concern is how spark will behave with a lot of data during training and
prediction phase. I need to know exactly what memory requirements are there
for a given set of data and where it is needed (driver or executor). If
there are any guidelines for it, that would be great. 

Thanks, 
Jatin 



-----
Novice Big Data Programmer
--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809p14969.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to