All, I have the following methods in my scala code, currently executed on demand
val files = sc.binaryFiles ("file:///imocks/data/ocr/raw") //Abive line takes all PDF files files.map(myconveter(_)).count myconverter signature: def myconverter ( file: (String, org.apache.spark.input.PortableDataStream) ) : Unit = { //Code to interact with IBM Datamap OCR which converts the PDF files into text } I do want to change the above code to Spark streaming. Unfortunately there is ( definitely the would be a great addition to Spark) No "binaryFiles" functions from StreamingContext. The closest I can think of is to write like this: //Assuming myconverter is not changed val dstream = ssc.fileStream[BytesWritable,BytesWritable, SequenceFileAsBinaryInputFormat]("file:///imocks/data/ocr/raw") ; dstream.map(myconverter(_)) Unfortunately everything is in problem now. There are errors showing the method signature does not match etc etc. Can anyone please help how can I get out of the issue? Appreciate your help. Also, won't it be a super excellent idea to have all methods of SparkContext to be reusable for StreamingContext as well ? In that way, it takes no extra effort to change a batch program to a streaming app. Best, Passion