Thanks Andrew. I agree that decoupling component is good solution from long term perspective. My current data pipeline in Nifi is designed for batch processing which I am trying to convert into streaming model.
One of the processor in data pipeline invokes Spark job , once job finished control is returned to Nifi processor in turn which generates provenance event for job. This provenance event is important for us. Keeping batch model architecture in mind, I want to designed spark streaming based model in which Nifi Spark streaming processor will process micro batch and job status will be returned to Nifi with provenance event. Then I can capture that provenance data for my reports. Essentially I will be using Nifi for capturing provenance event where actual processing will be done by Spark streaming job. Do you see this approach logical ? Thanks Shashi On Sun, Jun 4, 2017 at 3:10 PM, Andrew Psaltis <[email protected]> wrote: > Hi Shashi, > I'm sure there is a way to make this work. However, my first question is > why you would want to? By design a Spark Streaming application should > always be running and consuming data from some source, hence the notion of > streaming. Tying Spark Streaming to NiFi would ultimately result in a more > coupled and fragile architecture. Perhaps a different way to think about it > would be to set things up like this: > > NiFi --> Kafka <-- Spark Streaming > > With this you can do what you are doing today -- using NiFi to ingest, > transform, make routing decisions, and feed data into Kafka. In essence you > would be using NiFi to do all the preparation of the data for Spark > Streaming. Kafka would serve the purpose of a buffer between NiFi and Spark > Streaming. Finally, Spark Streaming would ingest data from Kafka and do > what it is designed for -- stream processing. Having a decoupled > architecture like this also allows you to manage each tier separately, thus > you can tune, scale, develop, and deploy all separately. > > I know I did not directly answer your question on how to make it work. > But, hopefully this helps provide an approach that will be a better long > term solution. There may be something I am missing in your initial > questions. > > Thanks, > Andrew > > > > On Sat, Jun 3, 2017 at 10:43 PM, Shashi Vishwakarma < > [email protected]> wrote: > >> Hi >> >> I am looking for way where I can make use of spark streaming in Nifi. I >> see couple of post where SiteToSite tcp connection is used for spark >> streaming application but I thinking it will be good If I can launch Spark >> streaming from Nifi custom processor. >> >> PublishKafka will publish message into Kafka followed by Nifi Spark >> streaming processor will read from Kafka Topic. >> >> I can launch Spark streaming application from custom Nifi processor using >> Spark Streaming launcher API but biggest challenge is that it will create >> spark streaming context for each flow file which can be costly operation. >> >> Does any one suggest storing spark streaming context in controller >> service ? or any better approach for running spark streaming application >> with Nifi ? >> >> Thanks and Regards, >> Shashi >> >> >> > > > -- > Thanks, > Andrew > > Subscribe to my book: Streaming Data <http://manning.com/psaltis> > <https://www.linkedin.com/pub/andrew-psaltis/1/17b/306> > twiiter: @itmdata <http://twitter.com/intent/user?screen_name=itmdata> >
