also this is a great use case which has been done quite a bit in the
past using exactly the sort of logic Bryan calls out.  We've also done
things like written custom controller services specific to the type of
data and data structures needed for the job.  But the
plumbing/infrastructure for it is well supported to avoid the RPC
calls you mention, ensure the cache gets frequently updated live, and
that the cache can be used by numerous components at once.

Thanks
Joe

On Thu, Aug 25, 2016 at 9:57 AM, Bryan Bende <[email protected]> wrote:
> Hi Mike,
>
> I think one approach might the following...
>
> Setup controller services for DistributedMapCacheServer and
> DistributedMapCacheClient, then have part of your flow that is triggered
> periodically and queries your Hive table, probably need to split/parse the
> results, and then use PutDistributedMapCache processor to store them in the
> cache.
>
> In the other part of your flow use FetchDistributedMapCache to do a look up
> against the cache.
>
> I haven't worked through all of the exact steps, but I think something like
> that should work.
>
> Thanks,
>
> Bryan
>
> On Thu, Aug 25, 2016 at 6:38 AM, Mike Harding <[email protected]>
> wrote:
>>
>> Hi All,
>>
>> I have a mapping table stored in hive that maps an ID to a readable name
>> string. When a JSON object enters my nifi pipeline as a flowfile I want to
>> be able to inject the readable name string into the JSON object. The problem
>> is currently as each flowfile enters the pipe I have to make a SelectHiveQL
>> call tofirst get the lookup table data and store as attributes.
>>
>> Is there a way I can load the lookup table data once or on a periodic
>> basis into nifi (as a global variable/attribute) to save having to make the
>> select call for each flowfile which translates to 1000's of calls a minute?
>>
>> Thanks,
>> Mike
>
>

Reply via email to