Could you share the error log? What do you mean by "500 instead of 200"? If this is the number of files, try to use `repartition` before calling naive Bayes, which works the best when the number of partitions matches the number of cores, or even less. -Xiangrui
On Tue, Feb 10, 2015 at 10:34 PM, rkgurram <rkgur...@gmail.com> wrote: > Further I have tried HttpBroadcast but that too does not work. > > It is almost like there is a MemoryLeak because if I increase the input > files to "500" instead of "200" the system crashes early. > > > The code is as follows > ======================== > > logger.info("Training the model Fold:["+ fold +"]") > logger.info("Step 1: Split the input into Training and Testing sets") > val splits = labeledPointRDD.randomSplit(Array(0.6, 0.4), seed = 11L) > logger.info("Step 1: splits successful...") > > val training = splits(0) > val test = splits(1) > status = ModelStatus.IN_TRAINING > //logger.info("Fold:[" + fold + "] Training count: " + training.count() > + " Testing/Verification count:" + test.count()) > > logger.info("Step 2: Train the NB classifier") > model = NaiveBayes.train(training, lambda = 1.0) > logger.info("Step 2: NB model training complete Fold:[" + fold + "]") > > logger.info("Step 3: Testing/Verification of the model") > status = ModelStatus.IN_VERIFICATION > val predictionAndLabel = test.map(p => (model.predict(p.features), > p.label)) > val arry = predictionAndLabel.filter(x => x._1 == x._2) > val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == > x._2).count() / test.count() > logger.info("Step 3: Testing complete") > status = ModelStatus.INITIALIZED > logger.info("Fold["+ fold +"] Accuracy:[" + accuracy + "] Model > Status:[" + status + "]") > > > > > -Ravi > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-model-fails-after-a-few-predictions-tp21592p21593.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org