Interesting. Thanks for that feedback Harald. It might make sense to be more 
surgical about this, disabling it for MergeContent, for example, instead of all 
interflow processors.

Thanks
-Mark


> On Feb 22, 2023, at 5:42 AM, Dobbernack, Harald (Key-Work) 
> <[email protected]> wrote:
> 
> Just responding to this part:
>> You should not be using CRON driven for any processors in the middle of a 
>> flow. In fact, we really
>> should probably just disable that all together.
> Please don't disable this! We actually use CRON for some of our PutSFTP 
> Processors as there are servicetimes of these SFTP we are supposed to respect 
> and not use them or the SFTP will actually not be available... Of course we 
> can also use a routing to a wait processor if we have arrived at a time where 
> the destination should not be called, but it is so more simpler being able to 
> tell the processor in the middle of the flow when not to run.
> 
> -----Ursprüngliche Nachricht-----
> Von: Mark Payne <[email protected]>
> Gesendet: Dienstag, 21. Februar 2023 21:37
> An: [email protected]; John McGinn <[email protected]>
> Betreff: Re: Processor with cron scheduling in middle of flow
> 
> Key-Work IT-Sicherheit: Es handelt sich um eine externe E-Mail. Bitte nur auf 
> Links oder Anhänge klicken, sofern die Echtheit der Nachricht klar ist.
> 
> John,
> 
> You should not be using CRON driven for any processors in the middle of a 
> flow. In fact, we really should probably just disable that all together.
> In fact, it’s exceedingly rare that you’d want anything other than 
> Timer-Driven with a Run Schedule of 0 sec.
> MergeContent will not create any merged output on its first iteration after 
> it’s scheduled to run. It requires at least a second iteration before 
> anything is transferred. Its algorithm has evolved over time, and it may well 
> have happened to work previously but it’s really not being configured as 
> intended.
> 
> When you say that you’re retrieving data from a few sources and then “merges 
> that all back into a single file” - does that mean that you started with one 
> FlowFile, split it up, and then want to re-assemble the data after performing 
> enrichment? If so you’ll want to use a Merge Strategy of Defragment.
> 
> If you’re trying to just bring in some data and merge it together by 
> correlation attribute, then Bin Packing makes sense. Here, you have a few 
> properties that you can use to try to get the best bin packing. In short, a 
> bin will be created when any of these conditions is met:
> 
> - The Minimum Group Size is reached AND the Minimum Number of Entries is met
> - The Maximum Group Size OR the Maximum Number of Entries is met
> - A bin has been sitting for “Max Bin Age” amount of time
> - If a correlation attribute is used, and a FlowFile comes in that can’t go 
> into any bin, it will evict the oldest.
> 
> If you’re seeing bins smaller than expected, you can look at the Data 
> Provenance for the merged FlowFile, and it will tell you exactly which of the 
> conditions above triggered the data to be merged. This may help to adjust 
> these settings.
> 
> Hope this is helpful.
> 
> Thanks
> -Mark
> 
> 
>> On Feb 17, 2023, at 1:39 PM, John McGinn via users <[email protected]> 
>> wrote:
>> 
>> Hello,
>> 
>> NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
>> the better way to do this.
>> 
>> I've got a flow that retrieves data from a few data sources, enhances 
>> individual flow files, converts attributes to CSV and then merges that all 
>> back into a single file. It takes roughly 20 minutes for the process to run 
>> from start to the MergeContent part, so when I do it manually, I stop the 
>> MergeContent processor until all flowfiles are in the queue waiting, and 
>> then I start the MergeContent processor. (Run One Time doesn't work for some 
>> reason.) That works fine, manually.
>> 
>> When I try to put cron scheduling in, it never kicks off. For instance, the 
>> initial processor in the flow has a cron schedule of the top of the hour. (0 
>> 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). 
>> When I start the flow, the flowfiles are generated and queue up in front of 
>> MergeContent by 25 minutes past the hour, but the MergeContent never kicks 
>> off.
>> 
>> I added a correlation attribute recently and removed the cron entry, but the 
>> MergeContent just creates small bunches of merged files.
>> 
>> I even attempted to put a cron on the AttributesToCSV with a maximum bin age 
>> on the Merge Content, since it takes less than a minute for the 
>> AttribuesToCSV to process the flowfiles at that point, but the cron didn't 
>> kick off there either.
>> 
>> Any ideas on how to get this to work?
>> 
>> Thanks,
>> John
> 
> 
> 
> Harald Dobbernack
> 
> Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
> www.key-work.de<https://www.key-work.de> | 
> Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
> Fon: +49-721-78203-264 | E-Mail: [email protected]
> 
> Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
> Geschäftsführung: Petra Wotring

Reply via email to