Thanks Matt for the suggestion. We tried with look-up operator and
increased to concurrency as well. However, it was taking close to 25
minutes for 1M records on both the primary and look-up tables. Pls suggest
ways we can tune.

Regards
Sam

On Sat, Mar 14, 2020 at 9:17 PM Matt Burgess <[email protected]> wrote:

> Sam,
>
> This is a common enrichment use case and can be done using
> LookupRecord with a DatabaseRecordLookupService or
> SimpleDatabaseLookupService. You can read in one table (Customer) and
> then lookup the values in the purchase_order table based on the value
> of customer_id in each record.
>
> Does this satisfy your use case? If not please let me know, happy to
> help work through this.
>
> Regards,
> Matt
>
>
> On Sat, Mar 14, 2020 at 4:27 AM Samarendra Sahoo
> <[email protected]> wrote:
> >
> >
> > Hello,
> > We have use case where we have to load two tables say Customer (here
> customer ID is a sequence and gets generated while we load data) and
> purchase_order. While loading purchase_order need to populate customer_id
> based on SSN present in the purchase_order table. Since there is this
> dependency, trying to create this in one process group with Step1 - load
> customer, step2 - load purchase order with dummy customer_id, step 3 - join
> purchase_order and customer based on ssn and populate customer_id in
> purchase_order.
> >
> > While doing so, there are multiple flow files generated for customer
> table as we are loading this data based on partition. Would like to know,
> how to trigger next processor only once, when all flow files are processed
> by previous processor?
> >
> > Looking for help or if there are any better approaches to achieve this?
> >
> > Thanks
> > Sam
> >
> >
>

Reply via email to