Here's a piece of code which works well for us (spark 1.4.1) Configuration bsonDataConfig = new Configuration(); bsonDataConfig.set("mongo.job.input.format", "com.mongodb.hadoop.BSONFileInputFormat");
Configuration predictionsConfig = new Configuration(); predictionsConfig.set("mongo.output.uri", mongodbUri); JavaPairRDD<Object,BSONObject> bsonRatingsData = sc.newAPIHadoopFile( ratingsUri, BSONFileInputFormat.class, Object.class, BSONObject.class, bsonDataConfig); Thanks Best Regards On Mon, Aug 31, 2015 at 12:59 PM, Deepesh Maheshwari < deepesh.maheshwar...@gmail.com> wrote: > Hi, I am using <spark.version>1.3.0</spark.version> > > I am not getting constructor for above values > > [image: Inline image 1] > > So, i tried to shuffle the values in constructor . > [image: Inline image 2] > > But, it is giving this error.Please suggest > [image: Inline image 3] > > Best Regards > > On Mon, Aug 31, 2015 at 12:43 PM, Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Can you try with these key value classes and see the performance? >> >> inputFormatClassName = "com.mongodb.hadoop.MongoInputFormat" >> >> >> keyClassName = "org.apache.hadoop.io.Text" >> valueClassName = "org.apache.hadoop.io.MapWritable" >> >> >> Taken from databricks blog >> <https://databricks.com/blog/2015/03/20/using-mongodb-with-spark.html> >> >> Thanks >> Best Regards >> >> On Mon, Aug 31, 2015 at 12:26 PM, Deepesh Maheshwari < >> deepesh.maheshwar...@gmail.com> wrote: >> >>> Hi, I am trying to read mongodb in Spark newAPIHadoopRDD. >>> >>> /**** Code *****/ >>> >>> config.set("mongo.job.input.format", >>> "com.mongodb.hadoop.MongoInputFormat"); >>> config.set("mongo.input.uri",SparkProperties.MONGO_OUTPUT_URI); >>> config.set("mongo.input.query","{host: 'abc.com'}"); >>> >>> JavaSparkContext sc=new JavaSparkContext("local", "MongoOps"); >>> >>> JavaPairRDD<Object, BSONObject> mongoRDD = >>> sc.newAPIHadoopRDD(config, >>> com.mongodb.hadoop.MongoInputFormat.class, Object.class, >>> BSONObject.class); >>> >>> long count=mongoRDD.count(); >>> >>> There are about 1.5million record. >>> Though i am getting data but read operation took around 15min to read >>> whole. >>> >>> Is this Api really too slow or am i missing something. >>> Please suggest if there is an alternate approach to read data from Mongo >>> faster. >>> >>> Thanks, >>> Deepesh >>> >> >> >