Hi All,
I recently started learning Spark. I need to use spark-streaming.
1) Input, need to read from MongoDB
db.event_gcovs.find({executions:"56791a746e928d7b176d03c0", valid:1,
infofile:{$exists:1}, geo:"sunnyvale"}, {infofile:1}).count()
> Number of Info files: 24441
/* 0 */
{
"_id" : ObjectId("568eaeda71404e5c563ccb86"),
"infofile" :
"/volume/testtech/datastore/code-coverage/p//infos/svl/6/56791a746e928d7b176d03c0/
69958.pcp_napt44_20368.pl.30090.exhibit.R0-re0.15.1I20151218_1934_jammyc.pfe.i386.TC011.fail.FAIL.gcov.info
"
}
One info file can have 1000 of these blocks( Each block starts from "SF"
delimeter, and ends with the end_of_record.