Ben, To elaborate on NIFI-190... this ticket introduced two new processors (Wait and Notify) that use the DistributedMapCacheServer to communicate. They aren't released yet, but are in the master branch.
One example of using these processors is something like the following: - Lets say we have a flow file where the content is a CSV file, and each line is a URL to do a look-up somewhere - The flow file can be sent to a SplitText processor to get each line into its own flow file - The "original" relationship from SplitText can go to a Wait processor which will keep checking the cache for N signals (in this case N = the number of splits) - The "splits" relationship would go down a separate path where each split would be processed and eventually hit a Notify processor, which would increment the number of signals in the cache and optionally add attributes - When Wait sees N signals (or when an expiration is reached) it releases the original flow file and can optionally copy over attributes that the signals put in the cache So you get to continue processing the original flow file that is the CSV, but still being able to process the splits individually and get the attributes from them that might be the "results" in your case. Hope that helps. -Bryan On Thu, Jan 26, 2017 at 3:16 PM, Joe Witt <[email protected]> wrote: > Ben, > > One way to approach this is using the sort of capabilities this opens > up: https://issues.apache.org/jira/browse/NIFI-190 > > Certainly is a good case/idea to work through. Doable and > increasingly seems to be an ask. > > Thanks > Joe > > On Thu, Jan 26, 2017 at 3:10 PM, Benjamin Janssen <[email protected]> > wrote: > > Hello all, > > > > I've got a use case where I get some data, I want to fork a portion of > that > > data off to an external service for asynchronous processing, and when > that > > external service has finished processing the data, I want to take its > > output, marry it up with the original data, and pass the whole thing on > for > > further processing. > > > > So essentially two data flows: > > > > Receive Data -> Store Some State -> Send Data To External Service > > > > Do More Processing On Original Data + Results <- Retrieve Previously > Stored > > State <- Receive Results From External Service > > > > Is there a way to do this while taking advantage of NiFi's State > Management > > capabilities? I wasn't finding any obvious processors for persisting and > > retrieving shared state from State Management. The closest my googling > was > > able to get me was this: https://issues.apache.org/ > jira/browse/NIFI-1582 > > but if I'm understanding the State Management documentation properly, > that > > won't actually help me because I'd need the same processor to do all > storing > > and retrieving of state? > > > > Does something exist to use State Management like this? Or is what I'm > > proposing to do a bad idea? > > > > Or maybe I should just be using the DistributedMapCacheServer for this? > > > > Any help/advice would be appreciated. >
