Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-07 Thread Mayur Rustagi
So you can run a job / spark job to get data to disk/hdfs. Then run a
dstream from a hdfs folder. As you move your files, the dstream will kick
in.
Regards
Mayur
On 6 Jun 2014 21:13, Gianluca Privitera 
gianluca.privite...@studio.unibo.it wrote:

  Where are the API for QueueStream and RddQueue?
 In my solution I cannot open a DStream with S3 location because I need to
 run a script on the file (a script that unluckily doesn't accept stdin as
 input), so I have to download it on my disk somehow than handle it from
 there before creating the stream.

 Thanks
 Gianluca

 On 06/06/2014 02:19, Mayur Rustagi wrote:

 You can look to create a Dstream directly from S3 location using file
 stream. If you want to use any specific logic you can rely on Queuestream 
 read data yourself from S3, process it  push it into RDDQueue.

  Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
  @mayur_rustagi https://twitter.com/mayur_rustagi



 On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera 
 gianluca.privite...@studio.unibo.it wrote:

 Hi,
 I've got a weird question but maybe someone has already dealt with it.
 My Spark Streaming application needs to
 - download a file from a S3 bucket,
 - run a script with the file as input,
 - create a DStream from this script output.
 I've already got the second part done with the rdd.pipe() API that really
 fits my request, but I have no idea how to manage the first part.
 How can I manage to download a file and run a script on them inside a
 Spark Streaming Application?
 Should I use process() from Scala or it won't work?

 Thanks
 Gianluca






Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-07 Thread Mayur Rustagi
QueueStream example is in Spark Streaming examples:
http://www.boyunjian.com/javasrc/org.spark-project/spark-examples_2.9.3/0.7.2/_/spark/streaming/examples/QueueStream.scala


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Sat, Jun 7, 2014 at 6:41 PM, Mayur Rustagi mayur.rust...@gmail.com
wrote:

 So you can run a job / spark job to get data to disk/hdfs. Then run a
 dstream from a hdfs folder. As you move your files, the dstream will kick
 in.
 Regards
 Mayur
 On 6 Jun 2014 21:13, Gianluca Privitera 
 gianluca.privite...@studio.unibo.it wrote:

  Where are the API for QueueStream and RddQueue?
 In my solution I cannot open a DStream with S3 location because I need to
 run a script on the file (a script that unluckily doesn't accept stdin as
 input), so I have to download it on my disk somehow than handle it from
 there before creating the stream.

 Thanks
 Gianluca

 On 06/06/2014 02:19, Mayur Rustagi wrote:

 You can look to create a Dstream directly from S3 location using file
 stream. If you want to use any specific logic you can rely on Queuestream 
 read data yourself from S3, process it  push it into RDDQueue.

  Mayur Rustagi
 Ph: +1 (760) 203 3257
 http://www.sigmoidanalytics.com
  @mayur_rustagi https://twitter.com/mayur_rustagi



 On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera 
 gianluca.privite...@studio.unibo.it wrote:

 Hi,
 I've got a weird question but maybe someone has already dealt with it.
 My Spark Streaming application needs to
 - download a file from a S3 bucket,
 - run a script with the file as input,
 - create a DStream from this script output.
 I've already got the second part done with the rdd.pipe() API that
 really fits my request, but I have no idea how to manage the first part.
 How can I manage to download a file and run a script on them inside a
 Spark Streaming Application?
 Should I use process() from Scala or it won't work?

 Thanks
 Gianluca






Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-06 Thread Mayur Rustagi
You can look to create a Dstream directly from S3 location using file
stream. If you want to use any specific logic you can rely on Queuestream 
read data yourself from S3, process it  push it into RDDQueue.

Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera 
gianluca.privite...@studio.unibo.it wrote:

 Hi,
 I've got a weird question but maybe someone has already dealt with it.
 My Spark Streaming application needs to
 - download a file from a S3 bucket,
 - run a script with the file as input,
 - create a DStream from this script output.
 I've already got the second part done with the rdd.pipe() API that really
 fits my request, but I have no idea how to manage the first part.
 How can I manage to download a file and run a script on them inside a
 Spark Streaming Application?
 Should I use process() from Scala or it won't work?

 Thanks
 Gianluca




Re: Spark Streaming, download a s3 file to run a script shell on it

2014-06-06 Thread Gianluca Privitera

Where are the API for QueueStream and RddQueue?
In my solution I cannot open a DStream with S3 location because I need 
to run a script on the file (a script that unluckily doesn't accept 
stdin as input), so I have to download it on my disk somehow than handle 
it from there before creating the stream.


Thanks
Gianluca

On 06/06/2014 02:19, Mayur Rustagi wrote:
You can look to create a Dstream directly from S3 location using file 
stream. If you want to use any specific logic you can rely on 
Queuestream  read data yourself from S3, process it  push it into 
RDDQueue.


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi https://twitter.com/mayur_rustagi



On Fri, Jun 6, 2014 at 3:00 AM, Gianluca Privitera 
gianluca.privite...@studio.unibo.it 
mailto:gianluca.privite...@studio.unibo.it wrote:


Hi,
I've got a weird question but maybe someone has already dealt with it.
My Spark Streaming application needs to
- download a file from a S3 bucket,
- run a script with the file as input,
- create a DStream from this script output.
I've already got the second part done with the rdd.pipe() API that
really fits my request, but I have no idea how to manage the first
part.
How can I manage to download a file and run a script on them
inside a Spark Streaming Application?
Should I use process() from Scala or it won't work?

Thanks
Gianluca






Spark Streaming, download a s3 file to run a script shell on it

2014-06-05 Thread Gianluca Privitera

Hi,
I've got a weird question but maybe someone has already dealt with it.
My Spark Streaming application needs to
- download a file from a S3 bucket,
- run a script with the file as input,
- create a DStream from this script output.
I've already got the second part done with the rdd.pipe() API that 
really fits my request, but I have no idea how to manage the first part.
How can I manage to download a file and run a script on them inside a 
Spark Streaming Application?

Should I use process() from Scala or it won't work?

Thanks
Gianluca