I think the lookup processor should return data in a format that can be 
efficiently parsed/processed by NiFi expression language. For example – JSON. 
This would avoid using additional “Extract” type processor. All the downstream 
processor can simply work with “jsonPath” for additional lookup inside the 
attribute.

Regards,
Manish

From: Matt Burgess [mailto:[email protected]]
Sent: Friday, September 02, 2016 6:37 PM
To: [email protected]
Subject: Re: Processor to enrich attribute from external service

Manish,

Some of the queries in those processors could bring back lots of data, and 
putting them into an attribute could cause memory issues. Another concern is 
when the result is binary data, such as ExecuteSQL returning an Avro file. And 
since the return of these is a collection of records, these processors are 
often followed by a Split processor to perform operations on individual records.

Having said that, if the return value is text and you'd like to transfer it to 
an attribute, you can use ExtractText to put the content into an attribute. For 
small content (which is the appropriate use case), this should be pretty fast, 
and keeps the logic in a single processor instead of duplicated (either 
logically or physically) across processors.

By the way I'm very interested in an RDBMS lookup processor, but not sure I'd 
have time in the short run to write it up. If someone takes a crack at it, I 
recommend properties to pre-cache the table with a refresh interval. This way 
if the lookup table doesn't change much and is not too big, it could be read 
into the processor's memory for super-fast lookups. Alternatively, a property 
could be a cache size, which would build a subset of the table in memory as 
values are looked up. This is probably more robust as it is bounded and if the 
size is set high enough for a small table, it would be read in its entirety. 
Still would want the cache refresh property though.

Cheers,
Matt

On Sep 2, 2016, at 6:19 PM, Manish Gupta 8 
<[email protected]<mailto:[email protected]>> wrote:
Thanks for the reply Joe. Just a thought – do you think it would be a good idea 
for every Get processor (GetMongo, GetHBase etc.) to have 2 additional 
properties like:

1.      Result in Content or Result in Attribute

2.      Result Attribute Name (only applicable when “Result in Attribute” is 
selected).
But then all such processors should be able to accept incoming flowfile (which 
they don’t as of now – being a “Get”).

May be ExecuteSQL and FetchDistributeMapCache can be enhanced that way i.e. 
have an option to specify the destination – content or attribute?

Regards,
Manish

From: Joe Witt [mailto:[email protected]]
Sent: Friday, September 02, 2016 5:58 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Processor to enrich attribute from external service


You would need to make a custom process for now.  I think we should have a nice 
controller service to generalize jdbc lookups which supports caching.  And then 
a processor which leverages it.

This comes up fairly often and is pretty straightforward from a design POV.  
Anyone want to take a stab at this?

On Sep 2, 2016 4:47 PM, "Manish Gupta 8" 
<[email protected]<mailto:[email protected]>> wrote:
Hello Everyone,

Is there a processor that we can use for updating/adding an attribute of an 
incoming flow file from some external service (say MongoDB or Couchbase or any 
RDBMS)? The processor will use the attribute of incoming flow file, query the 
external service, and simply modify/add an additional attribute of flow-file 
(without touching the flow file content).

If we have to achieve this kind of “lookup” operation (but only to update 
attribute and not the content), what are the options in NiFi?
Should we create a custom processor (may be by taking GetMongo processor and 
modifying its code to update an attribute with query result)?

Thanks,
Manish

Reply via email to