Thanks Matt for the suggestion. We tried with look-up operator and increased to concurrency as well. However, it was taking close to 25 minutes for 1M records on both the primary and look-up tables. Pls suggest ways we can tune.
Regards Sam On Sat, Mar 14, 2020 at 9:17 PM Matt Burgess <[email protected]> wrote: > Sam, > > This is a common enrichment use case and can be done using > LookupRecord with a DatabaseRecordLookupService or > SimpleDatabaseLookupService. You can read in one table (Customer) and > then lookup the values in the purchase_order table based on the value > of customer_id in each record. > > Does this satisfy your use case? If not please let me know, happy to > help work through this. > > Regards, > Matt > > > On Sat, Mar 14, 2020 at 4:27 AM Samarendra Sahoo > <[email protected]> wrote: > > > > > > Hello, > > We have use case where we have to load two tables say Customer (here > customer ID is a sequence and gets generated while we load data) and > purchase_order. While loading purchase_order need to populate customer_id > based on SSN present in the purchase_order table. Since there is this > dependency, trying to create this in one process group with Step1 - load > customer, step2 - load purchase order with dummy customer_id, step 3 - join > purchase_order and customer based on ssn and populate customer_id in > purchase_order. > > > > While doing so, there are multiple flow files generated for customer > table as we are loading this data based on partition. Would like to know, > how to trigger next processor only once, when all flow files are processed > by previous processor? > > > > Looking for help or if there are any better approaches to achieve this? > > > > Thanks > > Sam > > > > >
