Juan, Glad to hear of your interest in this! Strangely, it seems to be a popular feature (see the existing Jira [1]) but so far there hasn't been a PR to address it. This has been done for QueryDatabaseTable, and one workaround is to use QueryDatabaseTable without specifying a Maximum Value Column, but that has its own drawbacks (it should only run on the primary node, doesn't allow incoming connections, and doesn't yet support arbitrary queries [2].
One thing I would keep in mind is whether to just break up the result set into multiple flow files (once Max Rows Per Flow File is reached, transfer the flow file and open a new flow file), or whether to also support committing the session after X flow files have been written. These can certainly be separate features/Jiras and feel free to only tackle the multiple flow files aspect. I only mention it because we did the same thing for QueryDatabaseTable. With large result sets, whether you break them up into multiple flow files or not, they will not be transferred downstream until the session is committed. If that is done after all rows are processed, the downstream processors will be waiting until all flow files are ready. The tradeoff is that the flow file(s) could contain the total row count. However if the Max Rows Per Flow File use case is to allow downstream processing, you could trade off the feature for each flow file to contain the total count with the feature that X flow files will be committed, and can thus be processed downstream while more rows are being processed by ExecuteSQL. Sorry for the long response, just wanted to get my thoughts down before I forget :) Let's use the dev list for any further discussion about this, since it's more geared towards development. Thanks, and looking forward to your contributions! Regards, Matt [1] https://issues.apache.org/jira/browse/NIFI-1251 [2] https://issues.apache.org/jira/browse/NIFI-1706 On Thu, Mar 8, 2018 at 3:20 PM, Juan Pablo Gardella <[email protected]> wrote: > Hello team, > > I would like to add "Max Rows Per Flow File" to ExecuteSQL processor. I can > create a JIRA and spent some time into that. But before doing this, I would > like to know if someone of the team see a problem with that or, if that is > intentional. > > I found that option useful in some use cases. > > Thanks, > Juan
