Do you have an index on the lookup table so it doesn't have to do a full
table scan?

On Thu, Mar 19, 2020 at 3:40 AM Samarendra Sahoo <[email protected]>
wrote:

> Thanks Matt for the suggestion. We tried with look-up operator and
> increased to concurrency as well. However, it was taking close to 25
> minutes for 1M records on both the primary and look-up tables. Pls suggest
> ways we can tune.
>
> Regards
> Sam
>
> On Sat, Mar 14, 2020 at 9:17 PM Matt Burgess <[email protected]> wrote:
>
>> Sam,
>>
>> This is a common enrichment use case and can be done using
>> LookupRecord with a DatabaseRecordLookupService or
>> SimpleDatabaseLookupService. You can read in one table (Customer) and
>> then lookup the values in the purchase_order table based on the value
>> of customer_id in each record.
>>
>> Does this satisfy your use case? If not please let me know, happy to
>> help work through this.
>>
>> Regards,
>> Matt
>>
>>
>> On Sat, Mar 14, 2020 at 4:27 AM Samarendra Sahoo
>> <[email protected]> wrote:
>> >
>> >
>> > Hello,
>> > We have use case where we have to load two tables say Customer (here
>> customer ID is a sequence and gets generated while we load data) and
>> purchase_order. While loading purchase_order need to populate customer_id
>> based on SSN present in the purchase_order table. Since there is this
>> dependency, trying to create this in one process group with Step1 - load
>> customer, step2 - load purchase order with dummy customer_id, step 3 - join
>> purchase_order and customer based on ssn and populate customer_id in
>> purchase_order.
>> >
>> > While doing so, there are multiple flow files generated for customer
>> table as we are loading this data based on partition. Would like to know,
>> how to trigger next processor only once, when all flow files are processed
>> by previous processor?
>> >
>> > Looking for help or if there are any better approaches to achieve this?
>> >
>> > Thanks
>> > Sam
>> >
>> >
>>
>

Reply via email to