Do you have an index on the lookup table so it doesn't have to do a full table scan?
On Thu, Mar 19, 2020 at 3:40 AM Samarendra Sahoo <[email protected]> wrote: > Thanks Matt for the suggestion. We tried with look-up operator and > increased to concurrency as well. However, it was taking close to 25 > minutes for 1M records on both the primary and look-up tables. Pls suggest > ways we can tune. > > Regards > Sam > > On Sat, Mar 14, 2020 at 9:17 PM Matt Burgess <[email protected]> wrote: > >> Sam, >> >> This is a common enrichment use case and can be done using >> LookupRecord with a DatabaseRecordLookupService or >> SimpleDatabaseLookupService. You can read in one table (Customer) and >> then lookup the values in the purchase_order table based on the value >> of customer_id in each record. >> >> Does this satisfy your use case? If not please let me know, happy to >> help work through this. >> >> Regards, >> Matt >> >> >> On Sat, Mar 14, 2020 at 4:27 AM Samarendra Sahoo >> <[email protected]> wrote: >> > >> > >> > Hello, >> > We have use case where we have to load two tables say Customer (here >> customer ID is a sequence and gets generated while we load data) and >> purchase_order. While loading purchase_order need to populate customer_id >> based on SSN present in the purchase_order table. Since there is this >> dependency, trying to create this in one process group with Step1 - load >> customer, step2 - load purchase order with dummy customer_id, step 3 - join >> purchase_order and customer based on ssn and populate customer_id in >> purchase_order. >> > >> > While doing so, there are multiple flow files generated for customer >> table as we are loading this data based on partition. Would like to know, >> how to trigger next processor only once, when all flow files are processed >> by previous processor? >> > >> > Looking for help or if there are any better approaches to achieve this? >> > >> > Thanks >> > Sam >> > >> > >> >
