Re: Reading Tweets (JSON) in a file into RDD Spark

Akhil Das Tue, 04 Feb 2014 09:46:20 -0800

If those files arent going to grow, then you can use the simple textFile
and do all your processing.
Sample code is below:


*import org.apache.spark.SparkContext*
*import org.apache.spark.SparkContext._*

*object SimpleApp{*

* def main(args: Array[String]){*

* val sc = new SparkContext("local", "Simple HDFS App",
"/home/akhld/mobi/spark-streaming/spark-0.8.0-incubating",List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))*

* val textFile = sc.textFile("hdfs://127.0.0.1:54310/akhld/tweet1.json
<http://127.0.0.1:54310/akhld/tweet1.json>")*
* textFile.take(10).foreach(println) *

* }*
*}*

If they are growing, then i think you might want to use textFileStream or
FileStream which will takecare of the processing of new files.


-
AkhilDas
CodeBreach.in

   - in.linkedin.com/in/akhildas/

Re: Reading Tweets (JSON) in a file into RDD Spark

Reply via email to