Thanks Akhil. In the above example, are you assuming that there is a tweet per line (i.e., tweets are new line separated) ?
On an unrelated note, can you send pointers about how to run this standalone example. Till now I've only played with the interactive spark-shell and yet to run a standalone scala program in cluster mode. On Tue, Feb 4, 2014 at 12:38 PM, Akhil Das <[email protected]> wrote: > If those files arent going to grow, then you can use the simple textFile > and do all your processing. > Sample code is below: > > *import org.apache.spark.SparkContext* > *import org.apache.spark.SparkContext._* > > *object SimpleApp{* > > * def main(args: Array[String]){* > > * val sc = new SparkContext("local", "Simple HDFS App", > "/home/akhld/mobi/spark-streaming/spark-0.8.0-incubating",List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))* > > * val textFile = sc.textFile("hdfs://127.0.0.1:54310/akhld/tweet1.json > <http://127.0.0.1:54310/akhld/tweet1.json>")* > * textFile.take(10).foreach(println) * > > * }* > *}* > > If they are growing, then i think you might want to use textFileStream or > FileStream which will takecare of the processing of new files. > > > - > AkhilDas > CodeBreach.in > > - in.linkedin.com/in/akhildas/ > >
