Boris,

I would echo the cautions from Bryan & Joe. However, you could perceivably 
achieve this by extracting out some id
into an attribute that would associate the two FlowFiles together (for example 
'dataset.id'). Use MergeRecord or MergeContent
to merge the data together using that as a correlation attribute or using the 
Defragment mode. This would get data from both
datasets into the same FlowFile. Then use QueryRecord and use the COALESCE 
function and GROUP BY in order to join
together the columns from both datasets. 

Your schema would need to accommodate all of the fields in both datasets, but 
if you're running 1.9.0, the schema inference
should handle that...


> On Feb 22, 2019, at 12:24 PM, Boris Tyukin <[email protected]> wrote:
> 
> Thanks Joe and Bryan. In this case I don't need to do it in real-time, 
> probably once a day only.
> 
> I am thinking to trigger both pulls by generateflow processor, then merge 
> datasets somehow since flowfile id will be the same for both sets. And then 
> need to join somehow.
> 
> Would like to use nifi still :)

Reply via email to