Re: Efficiently caching API results in a NiFi controller service

Tim Dean Tue, 01 May 2018 07:10:36 -0700

Thanks Bryan -

Should I be assuming that my service’s local map needs to be thread-safe, or 
would all service calls likely to be executed from within a single thread? I 
assume the former but want to be sure.


Assuming that thread-safety is needed, it seems like I should be using 
something like ConcurrentHashMap for my cache, correct?

-Tim

> On May 1, 2018, at 8:07 AM, Bryan Bende <[email protected]> wrote:
> 
> Tim,
> 
> The reason the DMC works the way it does is because the cached data
> needs to be shared across a cluster. For example, a processor like
> DetectDuplicate needs to detect duplicates across all NiFi nodes and
> not just the local node, or the same thing with Wait/Notify.
> 
> In your case I don't think you have the need to share data across
> nodes, so each NiFi node can have an instance of your controller
> service which could have a HashMap as you described.
> 
> You could definitely clear the map on enabled/disabled, and you could
> also implement strategies based on time like if a cached value is
> older than a certain threshold then remove it and re-fetch. It is
> really up to how you use the services.
> 
> I don't see any issues with memory as long as your cache doesn't grow
> indefinitely.
> 
> -Bryan
> 
> 
> On Tue, May 1, 2018 at 6:47 AM, Otto Fowler <[email protected]> wrote:
>> https://hc.apache.org/httpcomponents-client-ga/tutorial/html/caching.html ?
>> 
>> 
>> On May 1, 2018 at 00:01:58, Tim Dean ([email protected]) wrote:
>> 
>> Hello,
>> 
>> I have a custom NiFi controller service that retrieves data from an external
>> web service via HTTP requests. The results from these HTTP requests will be
>> needed at various points throughout my process flow. In some situations, I
>> could end up needing to access the HTTP response dozens or even hundreds of
>> times.
>> 
>> Given that the results of the HTTP request rarely change, I’d like them to
>> be cached by my service and returned to my processors when needed. I’d need
>> some way to explicitly clear the cache for those occasions when the data in
>> the service does change.
>> 
>> I’ve looked at using the DistributedMapCacheClientService implementation to
>> cache my web service’s results, but it seems like that connects to a server
>> via a socket connection and that doesn’t seem like it would be all that much
>> more efficient than calling the web service directly. I’ve also looked at
>> using the service’s state manager to store the results as state, but my data
>> is a little more complex than what the documentation for state suggests is
>> optimal: I don’t think my total map size will get to 1MB in size but it
>> could be possible.
>> 
>> Am I overthinking this? Would a simpler solution like creating a simple Java
>> HashMap inside my controller service be adequate? I could empty the contents
>> of the hash map whenever the controller services is enabled/disabled. Would
>> the memory used by this kind of simplified local caching cause problems
>> somewhere down the line?
>> 
>> Are there other caching strategies I should be considering?
>> 
>> Thanks
>> 
>> -Tim
>> 
>>

Re: Efficiently caching API results in a NiFi controller service

Reply via email to