If those files arent going to grow, then you can use the simple textFile
and do all your processing.
Sample code is below:
*import org.apache.spark.SparkContext*
*import org.apache.spark.SparkContext._*
*object SimpleApp{*
* def main(args: Array[String]){*
* val sc = new SparkContext("local", "Simple HDFS App",
"/home/akhld/mobi/spark-streaming/spark-0.8.0-incubating",List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))*
* val textFile = sc.textFile("hdfs://127.0.0.1:54310/akhld/tweet1.json
<http://127.0.0.1:54310/akhld/tweet1.json>")*
* textFile.take(10).foreach(println) *
* }*
*}*
If they are growing, then i think you might want to use textFileStream or
FileStream which will takecare of the processing of new files.
-
AkhilDas
CodeBreach.in
- in.linkedin.com/in/akhildas/