@Boris Mark's approach will work for a lot of scenarios. I've used it extensively with different clients.
On Fri, Feb 22, 2019 at 1:10 PM Mark Payne <[email protected]> wrote: > This is certainly a better route to go than my previous suggestion :) Have > one flow that grabs one of the datasets and stores it somewhere. > In a CSV or XML file, even. Then, have a second flow that pulls the other > dataset and uses LookupRecord to perform > the enrichment. The CSVLookupService and XMLLookupService would > automatically reload when the data is updated. > We should probably have a JDBCLookupService as well, which would allow for > dynamic lookups against a database. I > thought that existed already but does not appear to. Point is, you can > look at DataSet A as the 'reference dataset' and > DataSet B as the 'streaming dataset' and then use LookupRecord in order to > do the enrichment/join. > > Unfortunately, I don't seem to be able to find any blogs that describe > this pattern, but it would certainly make for a good > blog. Generally, you'd have two flows setup, though: > > Flow A (get the enrichment dataset): > ExcuteSQLRecord (write as CSV) -> PutFile > > Flow B (enrich the other dataset): > ExecuteSQLRecord -> LookupRecord (uses a CSVLookupService that loads the > file written by the other flow) -> PublishKafkaRecord_2_0 > > Thanks > -Mark > > > On Feb 22, 2019, at 12:30 PM, Joe Witt <[email protected]> wrote: > > I should add you can use NiFi to update the reference dataset in a > database/backing store in one flow. And have another flow that handles the > live stream/lookup,etc. MarkPayne/Others: I think there are blogs that > describe this pattern. Anyone have links? > > On Fri, Feb 22, 2019 at 12:27 PM Joe Witt <[email protected]> wrote: > >> Boris, >> >> Great. So have a process to load the periodic dataset into a lookup >> service. COuld be backed by a simple file, a database, Hive, whatever. >> Then have the live flow run against that. >> >> This reminds me - we should make a Kudu based lookup service i think. >> I'll chat with some of our new Kudu friends on this. >> >> Thanks >> >> On Fri, Feb 22, 2019 at 12:25 PM Boris Tyukin <[email protected]> >> wrote: >> >>> Thanks Joe and Bryan. In this case I don't need to do it in real-time, >>> probably once a day only. >>> >>> I am thinking to trigger both pulls by generateflow processor, then >>> merge datasets somehow since flowfile id will be the same for both sets. >>> And then need to join somehow. >>> >>> Would like to use nifi still :) >>> >> >
