Mike, I recommend QueryDatabaseTableRecord with judicious choices for the following properties:
Fetch Size: This should be tuned to return the most number of rows without causing network issues such as timeouts. Can be set to the same value as Max Rows Per Flow File ensuring one fetch per outgoing FlowFile Max Rows Per Flow File: This should be set to a reasonable number of rows per FlowFile, maybe 100K or even 1M if that doesn't cause issues (see above) Output Batch Size: This is the key to doing full selects on huge tables, as it allows FlowFiles to be committed to the session and passed downstream while the rest of the fetch is being processed. In your case if you set Max Rows to 100K then this could be 10, or if you set it to 1M it could be 1. Note that with this property set, the maxvalue.* and fragment.count attributes will not be set on these FlowFiles, so you can't merge them. I believe the maxvalue state will still be updated even if this property is used, so it should turn into an incremental fetch after the first full fetch is complete. Regards, Matt On Wed, Jun 22, 2022 at 10:00 AM Mike Thomsen <[email protected]> wrote: > > We have a table with 68M records that will blow up to over 250M soon, > and need to do a full table fetch on it. What's the best practice for > efficiently doing a partial or full select on it? > > Thanks, > > Mike
