also this is a great use case which has been done quite a bit in the past using exactly the sort of logic Bryan calls out. We've also done things like written custom controller services specific to the type of data and data structures needed for the job. But the plumbing/infrastructure for it is well supported to avoid the RPC calls you mention, ensure the cache gets frequently updated live, and that the cache can be used by numerous components at once.
Thanks Joe On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende <[email protected]> wrote: > Hi Mike, > > I think one approach might the following... > > Setup controller services for DistributedMapCacheServer and > DistributedMapCacheClient, then have part of your flow that is triggered > periodically and queries your Hive table, probably need to split/parse the > results, and then use PutDistributedMapCache processor to store them in the > cache. > > In the other part of your flow use FetchDistributedMapCache to do a look up > against the cache. > > I haven't worked through all of the exact steps, but I think something like > that should work. > > Thanks, > > Bryan > > On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <[email protected]> > wrote: >> >> Hi All, >> >> I have a mapping table stored in hive that maps an ID to a readable name >> string. When a JSON object enters my nifi pipeline as a flowfile I want to >> be able to inject the readable name string into the JSON object. The problem >> is currently as each flowfile enters the pipe I have to make a SelectHiveQL >> call tofirst get the lookup table data and store as attributes. >> >> Is there a way I can load the lookup table data once or on a periodic >> basis into nifi (as a global variable/attribute) to save having to make the >> select call for each flowfile which translates to 1000's of calls a minute? >> >> Thanks, >> Mike > >
