Yep.  Though I'd still say we should first look at simply making it
targeted that certain processors do not allow cron scheduling such as
MergeContent which is not designed to work well unless it is
consistently getting a chance to crunch numbers/data.
@CronSchedulingDisabled or something...  I do see cron scheduling in
the middle of a flow as a bit of an outlier but still a pretty clean
way of interjecting time/schedule based flow control when one needs it
like John mentioned.

Thanks

On Wed, Feb 22, 2023 at 8:46 AM Mark Payne <[email protected]> wrote:
>
> Jens,
>
> In this case it would make perfect sense to use Cron for the ListFile. 
> FetchFile would then be Timer Driven. The point here is around CRON driven 
> processors in the middle of the flow.
>
> Thanks
> Mark
>
> Sent from my iPhone
>
> On Feb 22, 2023, at 10:17 AM, Jens M. Kofoed <[email protected]> wrote:
>
> 
> Hi Mark
> We have many List/Get processors which is running via cron. Some systems 
> export data to disk every hour, but the systems can't block read acces to the 
> files while writing them. So NiFi can pull the same file multiple times and 
> tries to delete it while the file is written. But we know that the export 
> only takes 10 minutes. Therefore we use a CRON to get files between 0 0 15-55 
> * *
> We have similar issues with other systems only providing data or are 
> accessibly at specific time slots.
>
> To John:
> Could you use a Notify/Wait gate function. Where a wait processor is blocking 
> flowfiles to the mergeContent processor. And in another flow use a 
> generateFlowfile and a notify process to open the gate (wait process). After 
> the mergeContent you could have a notify process to close the gate again.
> In this way, you would get many flowfile into the mergeContent process at the 
> same time.
>
> Kind regards
> Jens M. Kofoed
>
>
> Den 22. feb. 2023 kl. 15.24 skrev Mark Payne <[email protected]>:
>
> Interesting. Thanks for that feedback Harald. It might make sense to be more 
> surgical about this, disabling it for MergeContent, for example, instead of 
> all interflow processors.
>
> Thanks
> -Mark
>
>
> On Feb 22, 2023, at 5:42 AM, Dobbernack, Harald (Key-Work) 
> <[email protected]> wrote:
>
>
> Just responding to this part:
>
> You should not be using CRON driven for any processors in the middle of a 
> flow. In fact, we really
>
> should probably just disable that all together.
>
> Please don't disable this! We actually use CRON for some of our PutSFTP 
> Processors as there are servicetimes of these SFTP we are supposed to respect 
> and not use them or the SFTP will actually not be available... Of course we 
> can also use a routing to a wait processor if we have arrived at a time where 
> the destination should not be called, but it is so more simpler being able to 
> tell the processor in the middle of the flow when not to run.
>
>
> -----Ursprüngliche Nachricht-----
>
> Von: Mark Payne <[email protected]>
>
> Gesendet: Dienstag, 21. Februar 2023 21:37
>
> An: [email protected]; John McGinn <[email protected]>
>
> Betreff: Re: Processor with cron scheduling in middle of flow
>
>
> Key-Work IT-Sicherheit: Es handelt sich um eine externe E-Mail. Bitte nur auf 
> Links oder Anhänge klicken, sofern die Echtheit der Nachricht klar ist.
>
>
> John,
>
>
> You should not be using CRON driven for any processors in the middle of a 
> flow. In fact, we really should probably just disable that all together.
>
> In fact, it’s exceedingly rare that you’d want anything other than 
> Timer-Driven with a Run Schedule of 0 sec.
>
> MergeContent will not create any merged output on its first iteration after 
> it’s scheduled to run. It requires at least a second iteration before 
> anything is transferred. Its algorithm has evolved over time, and it may well 
> have happened to work previously but it’s really not being configured as 
> intended.
>
>
> When you say that you’re retrieving data from a few sources and then “merges 
> that all back into a single file” - does that mean that you started with one 
> FlowFile, split it up, and then want to re-assemble the data after performing 
> enrichment? If so you’ll want to use a Merge Strategy of Defragment.
>
>
> If you’re trying to just bring in some data and merge it together by 
> correlation attribute, then Bin Packing makes sense. Here, you have a few 
> properties that you can use to try to get the best bin packing. In short, a 
> bin will be created when any of these conditions is met:
>
>
> - The Minimum Group Size is reached AND the Minimum Number of Entries is met
>
> - The Maximum Group Size OR the Maximum Number of Entries is met
>
> - A bin has been sitting for “Max Bin Age” amount of time
>
> - If a correlation attribute is used, and a FlowFile comes in that can’t go 
> into any bin, it will evict the oldest.
>
>
> If you’re seeing bins smaller than expected, you can look at the Data 
> Provenance for the merged FlowFile, and it will tell you exactly which of the 
> conditions above triggered the data to be merged. This may help to adjust 
> these settings.
>
>
> Hope this is helpful.
>
>
> Thanks
>
> -Mark
>
>
>
> On Feb 17, 2023, at 1:39 PM, John McGinn via users <[email protected]> 
> wrote:
>
>
> Hello,
>
>
> NiFi 1.19.0 - I need some help in trying to make my idea work, or figure out 
> the better way to do this.
>
>
> I've got a flow that retrieves data from a few data sources, enhances 
> individual flow files, converts attributes to CSV and then merges that all 
> back into a single file. It takes roughly 20 minutes for the process to run 
> from start to the MergeContent part, so when I do it manually, I stop the 
> MergeContent processor until all flowfiles are in the queue waiting, and then 
> I start the MergeContent processor. (Run One Time doesn't work for some 
> reason.) That works fine, manually.
>
>
> When I try to put cron scheduling in, it never kicks off. For instance, the 
> initial processor in the flow has a cron schedule of the top of the hour. (0 
> 0 * * * ?) I then put 25 past the hour for Merge Content (0 25 * * * ?). When 
> I start the flow, the flowfiles are generated and queue up in front of 
> MergeContent by 25 minutes past the hour, but the MergeContent never kicks 
> off.
>
>
> I added a correlation attribute recently and removed the cron entry, but the 
> MergeContent just creates small bunches of merged files.
>
>
> I even attempted to put a cron on the AttributesToCSV with a maximum bin age 
> on the Merge Content, since it takes less than a minute for the 
> AttribuesToCSV to process the flowfiles at that point, but the cron didn't 
> kick off there either.
>
>
> Any ideas on how to get this to work?
>
>
> Thanks,
>
> John
>
>
>
>
> Harald Dobbernack
>
>
> Key-Work Consulting GmbH | Kriegsstr. 100 | 76133 | Karlsruhe | Germany | 
> www.key-work.de<https://www.key-work.de> | 
> Datenschutz<https://www.key-work.de/de/footer/datenschutz.html>
>
> Fon: +49-721-78203-264 | E-Mail: [email protected]
>
>
> Key-Work Consulting GmbH, Karlsruhe, HRB 108695, HRG Mannheim
>
> Geschäftsführung: Petra Wotring
>
>

Reply via email to