Further I have tried HttpBroadcast but that too does not work.
It is almost like there is a MemoryLeak because if I increase the input
files to "500" instead of "200" the system crashes early.
The code is as follows
========================
logger.info("Training the model Fold:["+ fold +"]")
logger.info("Step 1: Split the input into Training and Testing sets")
val splits = labeledPointRDD.randomSplit(Array(0.6, 0.4), seed = 11L)
logger.info("Step 1: splits successful...")
val training = splits(0)
val test = splits(1)
status = ModelStatus.IN_TRAINING
//logger.info("Fold:[" + fold + "] Training count: " + training.count()
+ " Testing/Verification count:" + test.count())
logger.info("Step 2: Train the NB classifier")
model = NaiveBayes.train(training, lambda = 1.0)
logger.info("Step 2: NB model training complete Fold:[" + fold + "]")
logger.info("Step 3: Testing/Verification of the model")
status = ModelStatus.IN_VERIFICATION
val predictionAndLabel = test.map(p => (model.predict(p.features),
p.label))
val arry = predictionAndLabel.filter(x => x._1 == x._2)
val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 ==
x._2).count() / test.count()
logger.info("Step 3: Testing complete")
status = ModelStatus.INITIALIZED
logger.info("Fold["+ fold +"] Accuracy:[" + accuracy + "] Model
Status:[" + status + "]")
-Ravi
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-model-fails-after-a-few-predictions-tp21592p21593.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]