Boris, I would echo the cautions from Bryan & Joe. However, you could perceivably achieve this by extracting out some id into an attribute that would associate the two FlowFiles together (for example 'dataset.id'). Use MergeRecord or MergeContent to merge the data together using that as a correlation attribute or using the Defragment mode. This would get data from both datasets into the same FlowFile. Then use QueryRecord and use the COALESCE function and GROUP BY in order to join together the columns from both datasets.
Your schema would need to accommodate all of the fields in both datasets, but if you're running 1.9.0, the schema inference should handle that... > On Feb 22, 2019, at 12:24 PM, Boris Tyukin <[email protected]> wrote: > > Thanks Joe and Bryan. In this case I don't need to do it in real-time, > probably once a day only. > > I am thinking to trigger both pulls by generateflow processor, then merge > datasets somehow since flowfile id will be the same for both sets. And then > need to join somehow. > > Would like to use nifi still :)
