Mike,

I recommend QueryDatabaseTableRecord with judicious choices for the
following properties:

Fetch Size: This should be tuned to return the most number of rows
without causing network issues such as timeouts. Can be set to the
same value as Max Rows Per Flow File ensuring one fetch per outgoing
FlowFile
Max Rows Per Flow File: This should be set to a reasonable number of
rows per FlowFile, maybe 100K or even 1M if that doesn't cause issues
(see above)
Output Batch Size: This is the key to doing full selects on huge
tables, as it allows FlowFiles to be committed to the session and
passed downstream while the rest of the fetch is being processed. In
your case if you set Max Rows to 100K then this could be 10, or if you
set it to 1M it could be 1. Note that with this property set, the
maxvalue.* and fragment.count attributes will not be set on these
FlowFiles, so you can't merge them.  I believe the maxvalue state will
still be updated even if this property is used, so it should turn into
an incremental fetch after the first full fetch is complete.

Regards,
Matt

On Wed, Jun 22, 2022 at 10:00 AM Mike Thomsen <[email protected]> wrote:
>
> We have a table with 68M records that will blow up to over 250M soon,
> and need to do a full table fetch on it. What's the best practice for
> efficiently doing a partial or full select on it?
>
> Thanks,
>
> Mike

Reply via email to