Hi,

We're starting to build an analytics framework for our wellness service.
 While our data is not yet Big, we'd like to use a framework that will
scale as needed, and Spark seems to be the best around.

I'm new to Hadoop and Spark, and I'm having difficulty figuring out how to
use Spark in connection with MongoDB.  Apparently, I should be able to use
the mongo-hadoop connector (https://github.com/mongodb/mongo-hadoop) also
with Spark, but haven't figured out how.

I've run through the Spark tutorials and been able to setup a
single-machine Hadoop system with the MongoDB connector as instructed at
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
and
http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

Could someone give some instructions or pointers on how to configure and
use the mongo-hadoop connector with Spark?  I haven't been able to find any
documentation about this.


Thanks.


Best regards,
   Sampo N.

Reply via email to