I can try answering the question even if I am not Sanjeet ;) There isnt a simple way to do this. In fact the ideal way to do it would be to create a new InputDStream (just like FileInputDStream <https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala>) where you will create hadoop RDDs as SQS messages are received.
But stepping back, I want to understand why do you want to integrate with Spark Streaming at all? If you already have an working system that runs Spark jobs when SQS sends a message about new files, then why use Spark Streaming at all? What is lacking in that implementation? Based on that its worth going into the effort of implementing a new input stream. TD On Tue, Aug 5, 2014 at 12:45 AM, lalit1303 <la...@sigmoidanalytics.com> wrote: > Hi Sanjeet, > > I have been using spark streaming for processing of files present in S3 and > HDFS. > I am also using SQS messages for the same purpose as yours i.e. pointer to > S3 file. > As of now, I have a separate SQS job which receive message from SQS queue > and gets the corresponding file from S3. > Now, I wasnt to integrate the SQS receiver with spark streaming. Like, my > spark streaming job would listen for new SQS messages and proceed > accordingly. > I was wondering if you find any solution to this. Please let me know in > case!! > > In your above approach, you can achieve #4 in the following way: > When you are passing a forEach function to be applied on each RDD of > Dstream, you can pass information of SQS message (lke receipthandle for > deleting message) associated with that particualar file. > After success/failure in processing you can perform deletion of your SQS > message accordingly. > > > Thanks > --Lalit > > > > ----- > Lalit Yadav > la...@sigmoidanalytics.com > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-at-least-once-guarantee-tp10902p11419.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >