Re: Add Max Rows Per Flow File into ExecuteSQL

Matt Burgess Thu, 08 Mar 2018 12:53:33 -0800

Juan,

Glad to hear of your interest in this! Strangely, it seems to be a
popular feature (see the existing Jira [1]) but so far there hasn't
been a PR to address it. This has been done for QueryDatabaseTable,
and one workaround is to use QueryDatabaseTable without specifying a
Maximum Value Column, but that has its own drawbacks (it should only
run on the primary node, doesn't allow incoming connections, and
doesn't yet support arbitrary queries [2].

One thing I would keep in mind is whether to just break up the result
set into multiple flow files (once Max Rows Per Flow File is reached,
transfer the flow file and open a new flow file), or whether to also
support committing the session after X flow files have been written.
These can certainly be separate features/Jiras and feel free to only
tackle the multiple flow files aspect. I only mention it because we
did the same thing for QueryDatabaseTable. With large result sets,
whether you break them up into multiple flow files or not, they will
not be transferred downstream until the session is committed. If that
is done after all rows are processed, the downstream processors will
be waiting until all flow files are ready. The tradeoff is that the
flow file(s) could contain the total row count.  However if the Max
Rows Per Flow File use case is to allow downstream processing, you
could trade off the feature for each flow file to contain the total
count with the feature that X flow files will be committed, and can
thus be processed downstream while more rows are being processed by
ExecuteSQL.

Sorry for the long response, just wanted to get my thoughts down
before I forget :)  Let's use the dev list for any further discussion
about this, since it's more geared towards development. Thanks, and
looking forward to your contributions!

Regards,
Matt

[1] https://issues.apache.org/jira/browse/NIFI-1251
[2] https://issues.apache.org/jira/browse/NIFI-1706

On Thu, Mar 8, 2018 at 3:20 PM, Juan Pablo Gardella
<[email protected]> wrote:
> Hello team,
>
> I would like to add "Max Rows Per Flow File" to ExecuteSQL processor. I can
> create a JIRA and spent some time into that. But before doing this, I would
> like to know if someone of the team see a problem with that or, if that is
> intentional.
>
> I found that option useful in some use cases.
>
> Thanks,
> Juan

Re: Add Max Rows Per Flow File into ExecuteSQL

Reply via email to