Hi,
I've got a weird question but maybe someone has already dealt with it.
My Spark Streaming application needs to
- download a file from a S3 bucket,
- run a script with the file as input,
- create a DStream from this script output.
I've already got the second part done with the rdd.pipe() API that
really fits my request, but I have no idea how to manage the first part.
How can I manage to download a file and run a script on them inside a
Spark Streaming Application?
Should I use process() from Scala or it won't work?
Thanks
Gianluca