Indus, Let's project out one order of magnitude. That puts you at just a bit more than 500 lookups per second. You will want to consider the properties of the data itself and the properties of the database. This helps you decide on the appropriate caching and querying strategy. If the data as-is lends itself to highly effective caching behavior then you're good. If not you may choose to intentionally merge/batch data together to gain higher cache hits. With regard to the database itself if it is being updated frequently then you'll need a tighter cache refresh policy and so on. Anyway such cases and fairly wide combination of scenarios have been played out with great effect in NiFi. We have some of the tools you'll need out of the box to do this and some you'll possibly need to build for your database or caching strategy or if you want to edit the JSON data based on the results of the database lookup.
So general flow logic would be like this (not specific to any existing processor) 1) Consume JSON object 2) Extract username from JSON object as a flow file attribute 3) Execute SQL statement to lookup results of query with username. This could be a single processor which has a built-in caching mechanism or a processor that first looks up in a cache then if misses looks up against the actual database and then populates the cache. 4) The attributes of the flow file now contain the results of the query. Update the JSON object content if necessary using attributes or just make an attribute based routing decision The key part of this is really 3. Many ways to go after that. Thanks Joe On Sat, Nov 14, 2015 at 12:46 PM, indus well <[email protected]> wrote: > Hi NiFi Experts: > > I have a use case where I consume a Json message containing an userID and > need to lookup the value from a database table. The transaction volume is > averaging around 5 million per day and growing. What would be the best > approach for this scenario. > > Thanks, > > Indus > >
