Re: Best practice for querying table mid-flow

Bryan Bende Fri, 30 Sep 2016 07:18:17 -0700

Hi Mike,

This seems like the correct approach when using out-of-the-box processors.

You could potentially create a custom processor that performed all three of
those steps into one... take JSON as input and extract a value, query Hive,
merge results and write to the flow file.
Normally I would think that the complexity of querying Hive might not be
worth it, but you would be able to re-use the HiveDBCPConnectionPool
service and just have to run the query.

-Bryan

On Fri, Sep 30, 2016 at 8:12 AM, Mike Harding <[email protected]>
wrote:

> Hi All,
>
> I have a Nifi data flow that receives flowfiles each containing a JSON
> object. As part of the transformation of each flowfile I want to query a
> hive table using a property in the flowfile's JSON content to retrieve
> additional information that I then want to inject into the flowfile. The
> updated flowfile is then passed onto the next processor downstream.
>
> Currently the only way I can think of to do this is to:
>
> 1 - Put the Flowfile's JSON object into attributes using EvaluateJsonPath
> processor.
>
> 2 - Pass the Flowfile to a SelectHiveQL processor which runs the query
> (using the property from the attribute) and returns the result.
>
> 3 - I then pass this to an ExecuteScript processor where I extract the
> query result from the Flowfile content and write out the original JSON
> object (stored in the attribute) to a new Flowfile content using the query
> result to update properties in the JSON object.
>
> Does this make sense, feels like there must be a simpler way?
>
> Mike
>

Re: Best practice for querying table mid-flow

Reply via email to