Xiangrui, Thanks for replying. I am using the subset of newsgroup20 data. I will send you the vectorized data for analysis shortly.
I have tried running in local mode as well but I get the same OOM exception. I started with 4GB of data but then moved to smaller set to verify that everything was fine but I get the error on this small data too. I ultimately want the system to handle any amount of data we throw at it without OOM exceptions. My concern is how spark will behave with a lot of data during training and prediction phase. I need to know exactly what memory requirements are there for a given set of data and where it is needed (driver or executor). If there are any guidelines for it, that would be great. Thanks, Jatin ----- Novice Big Data Programmer -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Out-of-memory-exception-in-MLlib-s-naive-baye-s-classification-training-tp14809p14969.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org