In general, if you have multiple steps in a workflow : For every batch 1.stream data from s3 2.write it to hbase 3.execute a hive step using the data in s3
In this case all these 3 steps are part of the workflow. That's the reason I mentioned about workflow orchestration. The other question (2) is about how to manage the clusters without any downtime / data loss .(especially when you want k being down the cluster and create a new one for running spark streaming ) Sent from my iPhone > On Jun 22, 2016, at 10:17 AM, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > > Hi Pandees, > > can you kindly explain what you are trying to achieve by incorporating Spark > streaming with workflow orchestration. Is this some form of back-to-back > seamless integration. > > I have not used it myself but would be interested in knowing more about your > use case. > > Cheers, > > > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > http://talebzadehmich.wordpress.com > > >> On 22 June 2016 at 15:54, pandees waran <pande...@gmail.com> wrote: >> Hi Mich, please let me know if you have any thoughts on the below. >> >> ---------- Forwarded message ---------- >> From: pandees waran <pande...@gmail.com> >> Date: Wed, Jun 22, 2016 at 7:53 AM >> Subject: spark streaming questions >> To: user@spark.apache.org >> >> >> Hello all, >> >> I have few questions regarding spark streaming : >> >> * I am wondering anyone uses spark streaming with workflow orchestrators >> such as data pipeline/SWF/any other framework. Is there any advantages >> /drawbacks on using a workflow orchestrator for spark streaming? >> >> *How do you guys manage the cluster(bringing down /creating a new cluster ) >> without any data loss in streaming? >> >> I would like to hear your thoughts on this. >> >> >> >> >> -- >> Thanks, >> Pandeeswaran >