spark 0.8.0 null pointer exception when accessing mondodb twice

Yadid Ayzenberg Wed, 09 Oct 2013 14:17:02 -0700

Hi All,

I have successfully accessed my mongodb using spark. after creating aNewHadoopRDD and calling the function first() I get the data correctlyfrom the DB.However, if I call first() a second time (without calling anything elsein between), spark crashes with the following message:

org.apache.spark.rdd.NewHadoopRDD[java.lang.Object,org.bson.BSONObject]= NewHadoopRDD[1] at NewHadoopRDD at <console>:36


scala> a.first()

13/10/09 16:58:49 INFO spark.SparkContext: Starting job: first at<console>:3913/10/09 16:58:49 INFO scheduler.DAGScheduler: Got job 1 (first at<console>:39) with 1 output partitions (allowLocal=true)13/10/09 16:58:49 INFO scheduler.DAGScheduler: Final stage: Stage 1(first at <console>:39)13/10/09 16:58:49 INFO scheduler.DAGScheduler: Parents of final stage:List()

13/10/09 16:58:49 INFO scheduler.DAGScheduler: Missing parents: List()

13/10/09 16:58:49 INFO scheduler.DAGScheduler: Computing the requestedpartition locally13/10/09 16:58:49 INFO rdd.NewHadoopRDD: Input split:MongoInputSplit{URI=mongodb://mongo12.mit.edu/local.testCollection<http://mongo12.mit.edu/local.testCollection>, keyField=_id, min=null,max=null, query={ }, sort={ }, fields={ }, limit=0, skip=0, notimeout=false}13/10/09 16:58:49 INFO scheduler.DAGScheduler: Failed to run first at<console>:39

java.lang.NullPointerException
    at com.mongodb.DBApiLayer$Result.hasNext(DBApiLayer.java:416)
    at com.mongodb.DBCursor._hasNext(DBCursor.java:464)
    at com.mongodb.DBCursor.hasNext(DBCursor.java:484)

atcom.mongodb.hadoop.input.MongoRecordReader.nextKeyValue(MongoRecordReader.java:75)atorg.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:96)

    at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:381)
    at scala.collection.Iterator$class.foreach(Iterator.scala:772)
    at scala.collection.Iterator$$anon$18.foreach(Iterator.scala:379)

atscala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)atscala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:102)atscala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:250)

    at scala.collection.Iterator$$anon$18.toBuffer(Iterator.scala:379)

atscala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:237)

    at scala.collection.Iterator$$anon$18.toArray(Iterator.scala:379)
    at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)
    at org.apache.spark.rdd.RDD$$anonfun$12.apply(RDD.scala:768)

atorg.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)atorg.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:758)atorg.apache.spark.scheduler.DAGScheduler.runLocallyWithinThread(DAGScheduler.scala:484)atorg.apache.spark.scheduler.DAGScheduler$$anon$2.run(DAGScheduler.scala:470)

Any ideas what Im doing wrong ? is this a mongo driver problem or aspark problem?


Best,

Yadid

spark 0.8.0 null pointer exception when accessing mondodb twice

Reply via email to