Re: Naive Bayes model fails after a few predictions

2015-02-17 Thread Xiangrui Meng
Could you share the error log? What do you mean by 500 instead of
200? If this is the number of files, try to use `repartition` before
calling naive Bayes, which works the best when the number of
partitions matches the number of cores, or even less. -Xiangrui

On Tue, Feb 10, 2015 at 10:34 PM, rkgurram rkgur...@gmail.com wrote:
 Further I have tried HttpBroadcast but that too does not work.

 It is almost like there is a MemoryLeak because if I increase the input
 files to 500 instead of 200 the system crashes early.


 The code is as follows
 

   logger.info(Training the model Fold:[+ fold +])
 logger.info(Step 1: Split the input into Training and Testing sets)
 val splits = labeledPointRDD.randomSplit(Array(0.6, 0.4), seed = 11L)
 logger.info(Step 1: splits successful...)

 val training = splits(0)
 val test = splits(1)
 status = ModelStatus.IN_TRAINING
 //logger.info(Fold:[ + fold + ] Training count:  + training.count()
 +  Testing/Verification count: + test.count())

 logger.info(Step 2: Train the NB classifier)
 model = NaiveBayes.train(training, lambda = 1.0)
 logger.info(Step 2: NB model training complete Fold:[ + fold + ])

 logger.info(Step 3: Testing/Verification of the model)
 status = ModelStatus.IN_VERIFICATION
 val predictionAndLabel = test.map(p = (model.predict(p.features),
 p.label))
 val arry = predictionAndLabel.filter(x = x._1 == x._2)
 val accuracy = 1.0 * predictionAndLabel.filter(x = x._1 ==
 x._2).count() / test.count()
 logger.info(Step 3: Testing complete)
 status = ModelStatus.INITIALIZED
 logger.info(Fold[+ fold +] Accuracy:[ + accuracy + ] Model
 Status:[ + status + ])




 -Ravi



 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-model-fails-after-a-few-predictions-tp21592p21593.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Naive Bayes model fails after a few predictions

2015-02-10 Thread rkgurram
)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at org.apache.spark.rdd.FlatMappedRDD.compute(FlatMappedRDD.scala:33)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Failed to get broadcast_0_piece0
of broadcast_0
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1$$anonfun$2.apply(TorrentBroadcast.scala:137)
at scala.Option.getOrElse(Option.scala:121)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply$mcVI$sp(TorrentBroadcast.scala:136)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$org$apache$spark$broadcast$TorrentBroadcast$$readBlocks$1.apply(TorrentBroadcast.scala:119)
at scala.collection.immutable.List.foreach(List.scala:381)
at
org.apache.spark.broadcast.TorrentBroadcast.org$apache$spark$broadcast$TorrentBroadcast$$readBlocks(TorrentBroadcast.scala:119)
at
org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock$1.apply(TorrentBroadcast.scala:174)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1000)
... 23 more

--

Regards
-Ravi



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Naive-Bayes-model-fails-after-a-few-predictions-tp21592.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org