Re: Reading Tweets (JSON) in a file into RDD Spark

Soumya Simanta Tue, 04 Feb 2014 10:16:26 -0800

Thanks Akhil.

In the above example, are you assuming that there is a tweet per line
(i.e., tweets are new line separated) ?


On an unrelated note, can you send pointers about how to run this
standalone example. Till now I've only played with the interactive
spark-shell and yet to run a standalone scala program in cluster mode.





On Tue, Feb 4, 2014 at 12:38 PM, Akhil Das <[email protected]> wrote:

> If those files arent going to grow, then you can use the simple textFile
> and do all your processing.
> Sample code is below:
>
> *import org.apache.spark.SparkContext*
> *import org.apache.spark.SparkContext._*
>
> *object SimpleApp{*
>
> * def main(args: Array[String]){*
>
> * val sc = new SparkContext("local", "Simple HDFS App",
> "/home/akhld/mobi/spark-streaming/spark-0.8.0-incubating",List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))*
>
> * val textFile = sc.textFile("hdfs://127.0.0.1:54310/akhld/tweet1.json
> <http://127.0.0.1:54310/akhld/tweet1.json>")*
>  * textFile.take(10).foreach(println) *
>
> * }*
> *}*
>
> If they are growing, then i think you might want to use textFileStream or
> FileStream which will takecare of the processing of new files.
>
>
> -
> AkhilDas
> CodeBreach.in
>
>    - in.linkedin.com/in/akhildas/
>
>

Re: Reading Tweets (JSON) in a file into RDD Spark

Reply via email to