Can you try using sc.newAPIHadoop** ? There are two kinds of classes because the Hadoop API for input and output format had undergone a significant change a few years ago.
TD On Tue, Feb 4, 2014 at 5:58 AM, Sampo Niskanen <[email protected]>wrote: > Hi, > > Thanks for the pointer. However, I'm still unable to generate the RDD > using MongoInputFormat. I'm trying to add the mongo-hadoop connector to > the Java SimpleApp in the quickstart at > http://spark.incubator.apache.org/docs/latest/quick-start.html > > > The mongo-hadoop connector contains two versions of MongoInputFormat, one > extending org.apache.hadoop.mapreduce.InputFormat<Object, BSONObject>, > the other extending org.apache.hadoop.mapred.InputFormat<Object, > BSONObject>. Neither of them is accepted by the compiler, and I'm unsure > why: > > JavaSparkContext sc = new JavaSparkContext("local", "Simple App"); > sc.hadoopRDD(job, > com.mongodb.hadoop.mapred.MongoInputFormat.class, Object.class, > BSONObject.class); > sc.hadoopRDD(job, com.mongodb.hadoop.MongoInputFormat.class, > Object.class, BSONObject.class); > > Eclipse gives the following error for both the the latter two lines: > > Bound mismatch: The generic method hadoopRDD(JobConf, Class<F>, > Class<K>, Class<V>) of type JavaSparkContext is not applicable for the > arguments (JobConf, Class<MongoInputFormat>, Class<Object>, > Class<BSONObject>). The inferred type MongoInputFormat is not a valid > substitute for the bounded parameter <F extends InputFormat<K,V>> > > > > I'm using Spark 0.9.0. Might this be caused by a conflict of Hadoop > versions? I downloaded the mongo-hadoop connector for Hadoop 2.2. I > haven't figured out how to select which Hadoop version Spark uses, when > required from an sbt file. (The SBT file is the one described in the > quickstart.) > > > Thanks for any help. > > > Best regards, > Sampo N. > > > > On Fri, Jan 31, 2014 at 5:34 AM, Tathagata Das < > [email protected]> wrote: > >> I walked through the example in the second link you gave. The Treasury >> Yield example referred there is >> here<https://github.com/mongodb/mongo-hadoop/blob/master/examples/treasury_yield/src/main/java/com/mongodb/hadoop/examples/treasury/TreasuryYieldXMLConfigV2.java>. >> Note the InputFormat and OutputFormat used in the job configuration. This >> InputFormat and OutputFormat specifies how to write data in and out of >> MongoDB. You should be able to use the same InputFormat and outputFormat >> class in Spark as well. For saving files to MongoDB, use >> yourRDD.saveAsHadoopFile(.... specify the output format class ...) and to >> read from MongoDB sparkContext.hadoopFile(..... specify input format class >> ....) . >> >> TD >> >> >> On Thu, Jan 30, 2014 at 12:36 PM, Sampo Niskanen < >> [email protected]> wrote: >> >>> Hi, >>> >>> We're starting to build an analytics framework for our wellness service. >>> While our data is not yet Big, we'd like to use a framework that will >>> scale as needed, and Spark seems to be the best around. >>> >>> I'm new to Hadoop and Spark, and I'm having difficulty figuring out how >>> to use Spark in connection with MongoDB. Apparently, I should be able to >>> use the mongo-hadoop connector (https://github.com/mongodb/mongo-hadoop) >>> also with Spark, but haven't figured out how. >>> >>> I've run through the Spark tutorials and been able to setup a >>> single-machine Hadoop system with the MongoDB connector as instructed at >>> >>> http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ >>> and >>> http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/ >>> >>> Could someone give some instructions or pointers on how to configure and >>> use the mongo-hadoop connector with Spark? I haven't been able to find any >>> documentation about this. >>> >>> >>> Thanks. >>> >>> >>> Best regards, >>> Sampo N. >>> >>> >>> >> >
